Results 1 to 5 of 5

Thread: May be this will accelerate LZ4 decompression?

  1. #1
    Member lz77's Avatar
    Join Date
    Jan 2016
    Location
    Russia
    Posts
    53
    Thanks
    16
    Thanked 11 Times in 7 Posts

    Question May be this will accelerate LZ4 decompression?

    See LZ4_decompress_generic We can exclude operators
    Code:
    length = token >> 4;
    length = token & 15;
    when llll < 15 and mmmm < 15. We need to separate the seldom cases when llll == 15 or mmmm == 15. This is a schema in asm:
    Code:
    xor ecx,ecx
    xor ebx,ebx
    mov cl,[esi]         ; cl = token (llllmmmm)
    mov bx,[esi+1]
    neg ebx              ; ebx = -offset
    add esi,3
    shl ecx,4            ; cl = mmmm*16, ch = llll
    
    cmp ch,$0F
    je .too_many_literals
    
    ===
    
    ; Copying literals, ch == count of literals  ...
    
    ===
    
    cmp cl,$F0
    je .too_big_matchlen
    
    ; To copy by 4 bytes offset must be greater than 3
    cmp ebx,-4
    jg .1
    
    ; Copying matched bytes, cl == (count-4)*16 of matched bytes
    
    mov eax,[esi+ebx]
    mov [edi],eax                  ; copying 4 bytes
    add esi,4 add edi,4
    
    .while cl >= 4*16
    mov eax,[esi+ebx]
    mov [edi],eax                 ; copying by 4 bytes
    add esi,4
    add edi,4
    sub cl,4*16                  ; sub cl, works also fast as dec cl
    .endwhile
    
    jmp .2
    .1:
    add cl,4*16
    
    .2:
    .while cl
    mov al,[esi+ebx]
    mov [edi],al                 ; copying by 1 byte
    inc esi
    inc edi
    sub cl,16
    .endwhile
    What do you think about it?
    Last edited by lz77; 14th November 2017 at 10:27.

  2. #2
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    870
    Thanks
    471
    Thanked 264 Times in 109 Posts
    This is an interesting idea.
    Do you believe you can benchmark it ?

  3. #3
    Member lz77's Avatar
    Join Date
    Jan 2016
    Location
    Russia
    Posts
    53
    Thanks
    16
    Thanked 11 Times in 7 Posts
    I can benchmark it, but in future. At this moment I do not have decompressing function in asm with such codewords.

  4. #4
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    870
    Thanks
    471
    Thanked 264 Times in 109 Posts
    It does work.

    We have observed some (variable) decompression speed gains, ranging from 0 to 10%,
    depending on file, cpu and compiler version.
    The variance of the impact is significant, but since it seems neutral at worst, it is a good gain on average.

    Therefore, this modification will be present in next lz4 release.

  5. Thanks:

    willvarfar (14th November 2017)

  6. #5
    Member lz77's Avatar
    Join Date
    Jan 2016
    Location
    Russia
    Posts
    53
    Thanks
    16
    Thanked 11 Times in 7 Posts
    I added a condition check: to copy by 4 bytes offset must be greater than 3. (Sorry forgot).

Similar Threads

  1. LZ4, BWT, RLE?
    By alberto98fx in forum Data Compression
    Replies: 6
    Last Post: 3rd July 2016, 20:09
  2. SSE, BMI do not accelerate LZ77 (un)compression
    By lz77 in forum Data Compression
    Replies: 12
    Last Post: 23rd June 2016, 10:24
  3. Fast LZ4+EC compressor
    By Bulat Ziganshin in forum Data Compression
    Replies: 220
    Last Post: 1st April 2015, 00:49
  4. New LZ4 vulnerability - to be checked
    By Cyan in forum Data Compression
    Replies: 2
    Last Post: 3rd July 2014, 09:18
  5. LZ4 Streaming API
    By Cyan in forum Data Compression
    Replies: 0
    Last Post: 20th May 2014, 21:45

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •