Results 1 to 19 of 19

Thread: LZSSE: fast x86_64 SIMD decompressor

  1. #1
    Member
    Join Date
    Feb 2016
    Location
    can
    Posts
    1
    Thanks
    0
    Thanked 11 Times in 1 Post

    LZSSE: fast x86_64 SIMD decompressor

    http://conorstokes.github.io/compres...-decompression

    https://github.com/ConorStokes/LZSSE

    enwiki8 on some specific system:

    Compressor name Compression Decompress. Compr. size Ratio
    memcpy 11392 MB/s 11411 MB/s 100000000 100.00
    lz4 r131 440 MB/s 2623 MB/s 57262281 57.26
    lz4fast r131 level 3 522 MB/s 2525 MB/s 63557747 63.56
    lz4fast r131 level 17 1157 MB/s 3916 MB/s 86208275 86.21
    lz4hc r131 level 12 31 MB/s 2591 MB/s 42196886 42.20
    lz4hc r131 level 16 31 MB/s 2605 MB/s 42196873 42.20
    lzham 1.0 -d26 level 0 8.68 MB/s 183 MB/s 37932690 37.93
    lzham 1.0 -d26 level 1 2.48 MB/s 265 MB/s 29545382 29.55
    lzsse4 0.1 212 MB/s 3128 MB/s 47108403 47.11
    lzsse8 0.1 214 MB/s 3107 MB/s 47249359 47.25
    lzsse2 0.1 2.88 MB/s 3759 MB/s 38060594 38.06
    zlib 1.2.8 level 9 17 MB/s 308 MB/s 36475792 36.48
    zstd v0.4.1 level 13 7.98 MB/s 382 MB/s 30761914 30.76
    zstd v0.4.1 level 17 3.58 MB/s 340 MB/s 28907539 28.91
    zstd v0.4.1 level 20 2.20 MB/s 241 MB/s 27195437 27.20
    Last edited by freshmeat; 17th February 2016 at 14:01.

  2. Thanks (11):

    Bulat Ziganshin (18th February 2016),comp1 (17th February 2016),Cyan (17th February 2016),DirtyPunk (20th February 2016),encode (17th February 2016),fhanau (17th February 2016),inikep (17th February 2016),JamesB (18th February 2016),ne0n (18th February 2016),nemequ (17th February 2016),Paul W. (17th February 2016)

  3. #2
    Member
    Join Date
    Feb 2016
    Location
    Perth, Australia
    Posts
    9
    Thanks
    11
    Thanked 3 Times in 1 Post
    The specific machine was an i7 4790 at stock clocks, running Windows 10, I usually include it but it slipped through this time. I'll update the post to reflect this. Happy to discuss this if anyone has any questions?

  4. #3
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    412
    Thanks
    38
    Thanked 64 Times in 38 Posts
    Is there a windows-binary (x64?) for a simple test - may be to compress / decompress a single file?

    if i read lzss then a first remember:
    http://encode.su/threads/143-LZSS-v0-01-is-here!

    source code: http://encode.su/attachment.php?atta...3&d=1332780406

    best regards

  5. #4
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,610
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Please compare it to the Nakamichi family. So far these have been the fastest decompressing LZ algorithms.

  6. #5
    Member
    Join Date
    May 2012
    Location
    United States
    Posts
    340
    Thanks
    194
    Thanked 58 Times in 42 Posts
    Could someone compile a Win32 binary for testing?

    I tried compiling with no success.

  7. #6
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    277
    Thanks
    116
    Thanked 159 Times in 116 Posts
    Quote Originally Posted by comp1 View Post
    Could someone compile a Win32 binary for testing?

    I tried compiling with no success.
    Isn't it only for 64 bit? (LZSSE: fast x86_64 SIMD decompressor)
    In An LZ Codec Designed for SSE Decompression there is:
    About two weeks ago a few ideas clicked together and I decided to start a new experiment, building an SSE/x64 oriented LZSS format, ...
    and
    It’s also important to note, this code is targeted at 64bit/SSE 4.1 and it eats pretty much all of the registers (and narrowly avoids spills). You’d probably want to do the implementation differently for targeting 32bit x86.

  8. #7
    Member
    Join Date
    Feb 2016
    Location
    Perth, Australia
    Posts
    9
    Thanks
    11
    Thanked 3 Times in 1 Post
    Currently I've been mainly testing it inside a version of lzbench, which I can provide a binary for, although I don't believe that will actually round-trip a file (just do the compression/decompression in memory). I'll do up a binary single file version this weekend.

    Quote Originally Posted by joerg View Post
    Is there a windows-binary (x64?) for a simple test - may be to compress / decompress a single file?

    if i read lzss then a first remember:
    http://encode.su/threads/143-LZSS-v0-01-is-here!

    source code: http://encode.su/attachment.php?atta...3&d=1332780406

    best regards

  9. #8
    Member
    Join Date
    Feb 2016
    Location
    Perth, Australia
    Posts
    9
    Thanks
    11
    Thanked 3 Times in 1 Post
    Even if you do get it compiling for Win32, it won't perform anywhere near as well due to the lack of registers on 32bit x86.

    Quote Originally Posted by Mauro Vezzosi View Post
    Isn't it only for 64 bit? (LZSSE: fast x86_64 SIMD decompressor)
    In An LZ Codec Designed for SSE Decompression there is:
    About two weeks ago a few ideas clicked together and I decided to start a new experiment, building an SSE/x64 oriented LZSS format, ...
    and
    It’s also important to note, this code is targeted at 64bit/SSE 4.1 and it eats pretty much all of the registers (and narrowly avoids spills). You’d probably want to do the implementation differently for targeting 32bit x86.

  10. #9
    Member
    Join Date
    Feb 2016
    Location
    Perth, Australia
    Posts
    9
    Thanks
    11
    Thanked 3 Times in 1 Post
    Any advice on taming Nakamichi into a test-harness and which variant to use?

    Quote Originally Posted by m^2 View Post
    Please compare it to the Nakamichi family. So far these have been the fastest decompressing LZ algorithms.

  11. #10
    Member
    Join Date
    Sep 2010
    Location
    US
    Posts
    126
    Thanks
    4
    Thanked 69 Times in 29 Posts
    Conor, I sent you email, not sure if you got it. I would report privately.

    Very interesting ideas, nice work. I think there're interesting possibilities along these lines.

    I'm trying to test performance, but can't really do it properly because the LZSSE2 encoder takes over 24 hours on some small files.

    The LZSSE2 Optimal Parser appears to be O(N^3)

    You can test it just by running on a file that's all zero bytes. For other stress tests check out http://www.cbloom.com/rants.html

    There are two standard fixes :

    1. When the optimal parser finds a match > 64 bytes (or whatever), just take the match and step ahead the parse position by matchlen. Don't parse at every position inside the match.

    2. In the match finder, "amortize" the search; that is, cap the number of tree steps at some max # (64 or whatever)
    (this is "cutValue" in LZMA LzFind)

    Alternatively, I think you could just use LzFind from LZMA.

    It's also a decent compromise to just use a Cache Table Match Finder with an optimal parse. This gives less compression but is a good space-speed tradeoff.


    Also may I suggest that any encoder that runs slower than 1 MB/s should print incremental progress reports (% done or something) so we have some idea where it is in long runs
    Last edited by cbloom; 3rd August 2016 at 20:01.

  12. Thanks (2):

    Bulat Ziganshin (18th February 2016),DirtyPunk (18th February 2016)

  13. #11
    Member
    Join Date
    Feb 2016
    Location
    Perth, Australia
    Posts
    9
    Thanks
    11
    Thanked 3 Times in 1 Post
    Hi Charles,
    I did get your email (and replied, which hopefully hasn't gotten lost). Thanks by the way!

    It's a current priority to replace the LZSSE2 match finding with something better (in fact, to work on the compression side in general, the compressors were really thrown together very quickly), it's terrible. I was planning on replacing it with a suffix array based matcher when I get some spare time, but in the short term LzFind, with the skipping ahead in the parse with large matches, sounds like a better option.

    Incremental progress reports are a good idea, I just hope LZSSE2 hasn't wasted too much of your time .

    Quote Originally Posted by cbloom View Post
    Conor, I sent you email, not sure if you got it. I would report privately.

    Very interesting ideas, nice work. I think there're interesting possibilities along these lines.

    I'm trying to test performance, but can't really do it properly because the LZSSE2 encoder takes over 24 hours on some small files.

    The LZSSE2 Optimal Parser appears to be O(N^3)

    You can test it just by running on a file that's all zero bytes. For other stress tests check out http://www.cbloom.com/rants.html

    There are two standard fixes :

    1. When the optimal parser finds a match > 64 bytes (or whatever), just take the match and step ahead the parse position by matchlen. Don't parse at every position inside the match.

    2. In the match finder, "amortize" the search; that is, cap the number of tree steps at some max # (64 or whatever)
    (this is "cutValue" in LZMA LzFind)

    Alternatively, I think you could just use LzFind from LZMA.

    It's also a decent compromise to just use a Cache Table Match Finder with an optimal parse. This gives less compression but is a good space-speed tradeoff.


    Also may I suggest that any encoder that runs slower than 1 MB/s should print incremental progress reports (% done or something) so we have some idea where it is in long runs
    Last edited by DirtyPunk; 18th February 2016 at 18:18.

  14. #12
    Member
    Join Date
    Sep 2010
    Location
    US
    Posts
    126
    Thanks
    4
    Thanked 69 Times in 29 Posts
    Urg you got spam filtered. Google!!

    Quote Originally Posted by DirtyPunk View Post
    I just hope LZSSE2 hasn't wasted too much of your time .
    Just my computer's!
    Last edited by cbloom; 3rd August 2016 at 20:00.

  15. #13
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,610
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by DirtyPunk View Post
    Any advice on taming Nakamichi into a test-harness and which variant to use?
    Depends on a test harness. If you have a scripted harness that expects to have compressor binaries, I don't know what's your problem. So I guess you compile a single test binary for all compressors and need a library . If it's so, look into fsbench, it integrates several Nakamichi variants. Thought it's far behind Sanmayce's efforts, just as I am, so I won't suggest what to test. Frankly, it would be great if somebody made a wide test of all variants (with fsbench it's not *that* much of work).

  16. #14
    Member
    Join Date
    Feb 2016
    Location
    Perth, Australia
    Posts
    9
    Thanks
    11
    Thanked 3 Times in 1 Post
    I've implemented a much faster match finder for LZSSE2's optimal parse based on Charles's suggestions, so it isn't as unbearable now. There is a small cost to compression ratio and a corresponding slight decrease in the decompression speed benchmarks for those files (the decompression hasn't changed, but LZSSE2 loves more compression to get its speed). The updated code is up on GitHub. I've also created a single file compression/decompression utility, but I am going to test it a bit more tomorrow before uploading it, to make sure all the framing etc is working well. After that I will see about doing a Nakamichi comparison.
    Last edited by DirtyPunk; 20th February 2016 at 22:41.

  17. #15
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    565
    Thanks
    67
    Thanked 199 Times in 147 Posts

    TurboBench & LZSSE

    I've integrated LZSSE (latest version from today) into TurboBench.

    I've made some minor changes to build on linux (see first lines in directory LZSSE_).
    You can make a build on linux or on windows (mingw-w64) with:

    Code:
    make LZSSE=1
    "sse4.1" capable CPU is required.
    You can also test Nakamichi (latest version) with TurboBench, although the compression is very slow for large files.

    And now a benchmark.
    CPU: Sandy bridge i7-2600k at 4.2GHz, gcc 5.2, ubuntu 15.10, single thread.
    - File app3.tar (binary Portable Apps Suite)


    Code:
          C Size  ratio%     C MB/s     D MB/s   Name            File              (bold = pareto) MB=1.000.0000
        54265486    54.2       2.11    3883.96   lzturbo 19      app3.tar
        55872095    55.8     121.10    3720.38   lzturbo 12      app3.tar
        57616229    57.6     312.96    3506.64   lzturbo 11      app3.tar
        61460108    61.4     684.49    3514.66   lzturbo 10      app3.tar
        62616599    62.6     246.83    2785.42   lzsse 8         app3.tar
        64507475    64.4     244.99    2216.69   lzsse 4         app3.tar
        65071637    65.0      28.37    1634.23   lzsse 2         app3.tar

  18. Thanks:

    DirtyPunk (21st February 2016)

  19. #16
    Member
    Join Date
    Nov 2015
    Location
    boot ROM
    Posts
    95
    Thanks
    27
    Thanked 17 Times in 15 Posts
    Dang, I've changed this thing to build with gcc, just to learn machine where I did it lacks SSE4. That's what I call FAIL. Probably I should check my /proc/cpuinfo before diving into source...

  20. #17
    Member
    Join Date
    Feb 2016
    Location
    Perth, Australia
    Posts
    9
    Thanks
    11
    Thanked 3 Times in 1 Post
    lzturbo is very impressive! Also, good work on getting the code integrated so quickly.

    It looks like you're using LZSSE2 level 0 there in the benchmark, at level 16 the ratio is 59.43 (and decompression performance improves a bit).

    This file definitely shows the weakness of LZSSE2 with binary executable files though. It will be interesting to implement better compressors for LZSSE4 and LZSSE8 to see how they perform on an equal footing to LZSSE2 with files like this.

    I'm curious to integrate your changes for GCC to check out how the code-gen compares, I had to tweak the code quite a bit to get the performance out of Visual Studio's code-gen, I'm sure there are some gains to be had on GCC and Clang as well. Performance drops off drastically if there are any register spills in the code-gen.

    I'm also curious to get more benchmarks for different processors, locally I've only benchmarked on Haswell.

    Quote Originally Posted by dnd View Post
    I've integrated LZSSE (latest version from today) into TurboBench.

    I've made some minor changes to build on linux (see first lines in directory LZSSE_).
    You can make a build on linux or on windows (mingw-w64) with:

    Code:
    make LZSSE=1
    "sse4.1" capable CPU is required.
    You can also test Nakamichi (latest version) with TurboBench, although the compression is very slow for large files.

    And now a benchmark.
    CPU: Sandy bridge i7-2600k at 4.2GHz, gcc 5.2, ubuntu 15.10, single thread.
    - File app3.tar (binary Portable Apps Suite)


    Code:
          C Size  ratio%     C MB/s     D MB/s   Name            File              (bold = pareto) MB=1.000.0000
        54265486    54.2       2.11    3883.96   lzturbo 19      app3.tar
        55872095    55.8     121.10    3720.38   lzturbo 12      app3.tar
        57616229    57.6     312.96    3506.64   lzturbo 11      app3.tar
        61460108    61.4     684.49    3514.66   lzturbo 10      app3.tar
        62616599    62.6     246.83    2785.42   lzsse 8         app3.tar
        64507475    64.4     244.99    2216.69   lzsse 4         app3.tar
        65071637    65.0      28.37    1634.23   lzsse 2         app3.tar

  21. #18
    Member
    Join Date
    Feb 2016
    Location
    Perth, Australia
    Posts
    9
    Thanks
    11
    Thanked 3 Times in 1 Post
    A bit delayed (have unfortunately been busy with other things), but there is now a visual studio 2015 built x64 executable for lzsse to play with (https://github.com/ConorStokes/LZSSE...4550/lzsse.zip). The code is available from this repository https://github.com/ConorStokes/LZSSE, and can also be built under gcc (I've only tested Windows right now with mingw, but it should work on Linux) by generating the make file with premake4.

    There have been a few extra things that have been added to lzsse in the meantime (optimal parsers for lzsse8 and lzsse4, better performance for compression), they are also available from the executable and in the repository.

    The example executable doesn't provide timings and the compressed files have some small overhead over the used codec (block sizes, magic number, compression mode etc). It isn't really designed as a benchmark, more as an example.

  22. Thanks (3):

    Bulat Ziganshin (14th May 2016),comp1 (14th May 2016),Turtle (15th May 2016)

  23. #19
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    I just created a fork (LZSSE-SIMDe) which provides a version which should run anywhere (i.e., machines without SSE4.1, even on other architectures like ARM). It's mostly a PoC to see if a new project of mine, SIMDe, is feasible… I think it is.

    There is still a lot of work to be done on the SIMDe side of things, but hopefully this will be interesting for someone even in its current state.

    I don't want to steal LZSSE's thread to talk about SIMDe, but if there is interest PM me and I'll create a new thread for SIMDe…

    Update: I added some information to the README about performance (including some very basic benchmarking info).
    Last edited by nemequ; 23rd April 2017 at 08:13.

  24. Thanks (3):

    Cyan (23rd April 2017),DirtyPunk (26th April 2017),Shelwien (23rd April 2017)

Similar Threads

  1. Looking for a super-simple decompressor
    By m^2 in forum Data Compression
    Replies: 13
    Last Post: 26th April 2015, 07:36
  2. Intel brings SIMD to JavaScript
    By nburns in forum The Off-Topic Lounge
    Replies: 60
    Last Post: 22nd July 2014, 00:26
  3. Good free SIMD library for x86 SSE & ARM NEON?
    By Paul W. in forum The Off-Topic Lounge
    Replies: 2
    Last Post: 17th May 2014, 04:31
  4. Replies: 4
    Last Post: 8th March 2014, 14:50
  5. Fastest decompressor!?
    By Sanmayce in forum Data Compression
    Replies: 66
    Last Post: 13th April 2011, 01:18

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •