Results 1 to 2 of 2

Thread: Compilers are crazy

  1. #1
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,267
    Thanks
    200
    Thanked 985 Times in 511 Posts

    Compilers are crazy

    Here's a question, why this code:
    Code:
    uint lpb = (p1>hSCALE);
    p1 = p1 + ((SCALE-p1-p1)&(-lpb));
    p1 = qmap[(p1+mask)>>shift];
    bit ^= lpb;
    is slower by 0.3s (in enwik8 processing) than

    Code:
    uint lpb = int(hSCALE-p1)>>31;
    p1 = p1 - ((-SCALE+p1+p1)&lpb);
    p1 = qmap[(p1+mask)>>shift];
    bit ^= -lpb;
    ?

    Hint: there're no branches in both cases, and there're two workarounds for
    IntelC weirdness in 2nd case.

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,267
    Thanks
    200
    Thanked 985 Times in 511 Posts
    Well, the answer:

    IntelC computes -(p1>hSCALE) as -(uint(-p1+hSCALE)>>31),
    guess there's a template for (x>y) in code generator, after arithmetic
    optimizations.

    And (SCALE-p1-p1) is computed as (-(p1+p1)+SCALE) - this
    time because there's a fast way to get x*2, but not x*-2 - via LEA.
    But again, only the x86 backend knows about LEA, and apparently it can't ask for
    arithmetic transformations.

    Btw, same thing about LEA also applies to hSCALE-p1 - its possible
    to calculate -hSCALE+p1 with one instruction, while hSCALE-p1 requires two,
    so there's actually a third version which improved the time by another 0.2s.

    I think this is interesting, because I don't remember seeing this kind of optimizations
    in the manuals.

    Code:
    ;;;         uint lpb = (p1>hSCALE);
    ;;;         p1 = p1 + ((SCALE-p1-p1)&(-lpb));
    ;;;         p1 = qmaq[(p1+mask)>>shift];
    ;;;         bit ^= lpb;
    
            dec       DWORD PTR [76+esi]                            ;76.11
            lea       edx, DWORD PTR [ebx+ebx]                      ;68.27
            neg       edx                                           ;68.27
            mov       ebp, ebx                                      ;69.9
            add       edx, 32768                                    ;68.30
            neg       ebp                                           ;69.9
            add       ebp, 16384                                    ;69.9
            shr       ebp, 31                                       ;69.9
            neg       ebp                                           ;68.36
            and       edx, ebp                                      ;68.36
            lea       eax, DWORD PTR [31+ebx+edx]                   ;69.23
            sar       eax, 5                                        ;69.30
            movzx     ebp, WORD PTR [?qmaq@@3PAGA+eax*2]            ;69.14
    Code:
    ;;;         uint lpb = int(hSCALE-p1)>>31;
    ;;;         p1 = p1 - ((-SCALE+p1+p1)&lpb);
    ;;;         p1 = qmaq[(p1+mask)>>shift];
    ;;;         bit ^= -lpb;
    ;;; 
            dec       DWORD PTR [76+esi]                            ;77.11
            mov       ebp, ebx                                      ;68.31
            neg       ebp                                           ;68.31
            add       ebp, 16384                                    ;68.31
            sar       ebp, 31                                       ;68.36
            lea       edx, DWORD PTR [-32768+ebx+ebx]               ;69.31
            and       edx, ebp                                      ;69.35
            sub       ebx, edx                                      ;69.35
            add       ebx, 31                                       ;70.23
            sar       ebx, 5                                        ;70.30
            movzx     ebx, WORD PTR [?qmaq@@3PAGA+ebx*2]            ;70.14
    Code:
    ;;;         uint lpb = ~(int(-(hSCALE+1)+p1)>>31);
    ;;;         p1 = p1 - ((-SCALE+p1+p1)&lpb);
    ;;;         p1 = qmaq[(p1+mask)>>shift];
    ;;;         bit ^= -lpb;
    
            dec       DWORD PTR [76+ebp]                            ;77.11
            lea       edx, DWORD PTR [-16385+esi+edi]               ;68.38
            sar       edx, 31                                       ;68.43
            not       edx                                           ;68.43
            lea       edi, DWORD PTR [-32768+ebx+ebx]               ;69.31
            and       edi, edx                                      ;69.35
            sub       ebx, edi                                      ;69.35
            add       ebx, 31                                       ;70.23
            sar       ebx, 5                                        ;70.30
            movzx     edi, WORD PTR [?qmaq@@3PAGA+ebx*2]            ;70.14

Similar Threads

  1. TC 5.2dev1 is here! This is CRAZY!
    By encode in forum Forum Archive
    Replies: 14
    Last Post: 6th February 2007, 02:49

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •