Page 2 of 2 FirstFirst 12
Results 31 to 55 of 55

Thread: 7-Zip binaries using the Fast LZMA2 lib

  1. #31
    Member
    Join Date
    Sep 2011
    Location
    uk
    Posts
    239
    Thanks
    192
    Thanked 17 Times in 12 Posts
    thanks - be better if you would change the names of the key stuff 7z.exe and 7z.dll so they don't get confused with real 7z & can be put into same directory. eg to 7zf.exe and 7zf.dll. And maybe 7zfm too? j

  2. #32
    Member
    Join Date
    Aug 2016
    Location
    Russia
    Posts
    108
    Thanks
    6
    Thanked 70 Times in 37 Posts
    Have you a plan to create a separate codec dll that can used with any 7z host app?

  3. #33
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    565
    Thanks
    67
    Thanked 199 Times in 147 Posts
    I've included Fast-lzma into TurboBench Compression Benchmark
    There is a conflict with ZSTD-library in "pool.c". I've tried to test multihreading by compiling TurboBench without ZSTD, but fast-lzma2 is crashing when you specify more than 2 threads.

    Here the results without multithreading for Skylake i6700 3.4 GHz.
    Benchmark with lzma-sdk v18.01

    Files from the Compression Benchmark
    Code:
          C Size    ratio%    C MB/s     D MB/s   Name        File
        48758739    23.0       2.47      81.17   lzma 9       silesia.tar
        49515082    23.4       4.84      79.77   flzma2 9     silesia.tar
     
        32823983    32.8       3.38      55.53   lzma 9       app3.tar
        39558020    39.5       5.22      55.03   flzma2 9     app3.tar
    
         7992210    25.0       4.21      66.85   lzma 9       pd3d.tar
         8108746    25.4       6.49      66.90   flzma2 9     pd3d.tar
    
    
        24861228    24.9       1.49      83.07   lzma 9       enwik8
        26465055    26.5       4.46      78.64   flzma2 9     enwik8
    
        13598062     6.6       2.70     260.41   lzma 9       access_log_Jul95
        13612075     6.6       3.51     243.60   flzma2 9     access_log_Jul95

  4. Thanks:

    Conor (10th March 2018)

  5. #34
    Member
    Join Date
    Feb 2015
    Location
    Australia
    Posts
    75
    Thanks
    13
    Thanked 67 Times in 25 Posts
    Quote Originally Posted by dnd View Post
    I've included Fast-lzma into TurboBench Compression Benchmark
    There is a conflict with ZSTD-library in "pool.c". I've tried to test multihreading by compiling TurboBench without ZSTD, but fast-lzma2 is crashing when you specify more than 2 threads.
    Thanks for the test results. I'll rename the functions from Zstd in the next release which will be very soon. Please note that level 9 is not comparable to lzma 9. Level 11 or 12 is a better comparison.

    EDIT: Please update your copy with the attached source files and delete pool.*
    Attached Files Attached Files
    Last edited by Conor; 10th March 2018 at 04:57.

  6. #35
    Member Zonder's Avatar
    Join Date
    May 2008
    Location
    Home
    Posts
    55
    Thanks
    20
    Thanked 6 Times in 5 Posts
    7-Zip v18.03 FL2 v0.9.2 beta is compiled with debug settings(needs special dlls), pls compile it as release.
    How about multithreading decompression with radyx mf? How about bigger dictionary, maybe 2gb or even 4gb? Btw on testset1 below dictionary size doesn't matter.

    Code:
    Testset1 - 3794MB	                             |   C-Size     |   C-time |  C-Ram  |D-speed | D-Ram
    7-Zip v18.03 lzma d1024mb ultra mt2 qs	             | 1419.789.767 | 00:23:00 | 10820MB | 58MB/s | 1041MB			
    7-Zip v18.03 lzma2 d512mb ultra mt4 qs	             | 1428.925.790 | 00:11:42 | 10736MB |127MB/s | 2871MB			
    7-Zip v18.03 FL2 v0.9.2 flzma2 d1024 x12 fb64 mt4 qs | 1439.799.697 | 00:07:38 |  6254MB | 56MB/s | 1043MB

    If someone needs debug x64 dlls:
    Attached Files Attached Files

  7. #36
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    976
    Thanks
    266
    Thanked 350 Times in 221 Posts
    Quote Originally Posted by dnd View Post
    fast-lzma2
    Large-window brotli can be interesting competition for it. Consider adding it in the benchmarks.

  8. #37
    Member
    Join Date
    Feb 2015
    Location
    Australia
    Posts
    75
    Thanks
    13
    Thanked 67 Times in 25 Posts
    Quote Originally Posted by Zonder View Post
    7-Zip v18.03 FL2 v0.9.2 beta is compiled with debug settings(needs special dlls), pls compile it as release.
    How about multithreading decompression with radyx mf? How about bigger dictionary, maybe 2gb or even 4gb? Btw on testset1 below dictionary size doesn't matter.
    Not sure what happened there. I used the makefiles as always, so maybe debug settings were added to them somewhere. I'll take a look.

  9. #38
    Member
    Join Date
    Feb 2015
    Location
    Australia
    Posts
    75
    Thanks
    13
    Thanked 67 Times in 25 Posts
    That was weird. The debug binary for the GUI somehow ended up in the output folder for the make script. I've replaced it with the correct file in the release.

  10. Thanks:

    load (10th March 2018)

  11. #39
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    573
    Thanks
    245
    Thanked 98 Times in 77 Posts
    Great work, Conor! As always

  12. #40
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    565
    Thanks
    67
    Thanked 199 Times in 147 Posts
    Thanks, it is working after your changes.

    - lzma is using only one thread, even by setting the number of threads = 2 as parameter in the call to LzmaEncode.
    Therefore, we can't make a reasonable mutithreaded comparison.
    - fast-lzma2 is a lot faster with 2 threads. Excellent work!
    - this benchmark shows clearly, that multithreaded match finders are memory bound.
    There is no acceleration with more than 2 threads, at least with my testing systems, skylake i6700 and an overclocked sandy bridge i2600k.
    This is also applicable to lzham and in general to all programs, that are accessing large memory parts in a more or less random processing.
    - brotli compression seems to work with a larger window (>16MB), but the decompressed buffer doesn't match.
    - included the peak memory usage for compression and decompression

    Turbobench compression benchmark with Skylake i6700 3.4 GHz, ubuntu 17.10, gcc 7.2. File: silesia.tar
    Code:
          C Size  ratio%     C MB/s     D MB/s   Name               C Peak Mem   D Peak Mem
        48,758,739    23.0       2.47      81.17   lzma 9          604,629,600       15,992
        49,013,348    23.1       4.23      80.62   flzma2 11       675,964,240       57,208
        49,021,875    23.1       7.91      80.67   flzma2 11mt2    680,049,408       57,208
        49,030,944    23.1       7.87      80.69   flzma2 11mt4    688,219,312       57,208
        49,034,963    23.1       7.82      80.69   flzma2 11mt8    704,560,056       57,208
        50,470,286    23.8       0.47     373.42   brotli 11d24    264,516,392   18,887,456
        50,861,542    24.0       1.68     269.97   lzham 4       5,406,315,184       43,104
        50,861,542    24.0       2.39     271.98   lzham 4t2     5,410,490,888       43,104
        50,861,542    24.0       2.35     271.97   lzham 4t4     5,411,539,608       43,104
        50,861,542    24.0       2.24     271.29   lzham 4t8     5,414,686,760       43,104
       211,948,036   100.0   13148.96   13543.87   memcpy                       silesia.tar

  13. #41
    Member
    Join Date
    Feb 2015
    Location
    Australia
    Posts
    75
    Thanks
    13
    Thanked 67 Times in 25 Posts
    Thanks for the test results. There are definitely only 2 cores available to the program. Speed with 4 threads should be at least 13 Mb/s. Most of the performance improvement comes from reducing random memory access within a large block to improve cache efficiency, so it still works well at 4 or even 8 threads.

  14. #42
    Member
    Join Date
    Feb 2015
    Location
    Australia
    Posts
    75
    Thanks
    13
    Thanked 67 Times in 25 Posts
    Quote Originally Posted by Aniskin View Post
    Have you a plan to create a separate codec dll that can used with any 7z host app?
    That can be easily done now when I found out how codec dlls work. Can you point me to an example?

  15. #43
    Member
    Join Date
    Aug 2016
    Location
    Russia
    Posts
    108
    Thanks
    6
    Thanked 70 Times in 37 Posts
    Quote Originally Posted by Conor View Post
    Can you point me to an example?
    Not sure that I understand your question correctly. Example of what?

  16. #44
    Member
    Join Date
    Feb 2015
    Location
    Australia
    Posts
    75
    Thanks
    13
    Thanked 67 Times in 25 Posts
    Quote Originally Posted by Aniskin View Post
    Not sure that I understand your question correctly. Example of what?
    Source code of an external codec plugin.

  17. #45
    Member
    Join Date
    Aug 2016
    Location
    Russia
    Posts
    108
    Thanks
    6
    Thanked 70 Times in 37 Posts
    7Zip-zstd project has codecs. And you can exam its source code (CPP\7zip\Bundles\Codec_*). I can create a sample but I am Delphi developer and my sources will be useless for C++ developer.

  18. Thanks:

    Conor (11th March 2018)

  19. #46
    Member
    Join Date
    Feb 2015
    Location
    Australia
    Posts
    75
    Thanks
    13
    Thanked 67 Times in 25 Posts
    Quote Originally Posted by Aniskin View Post
    7Zip-zstd project has codecs. And you can exam its source code (CPP\7zip\Bundles\Codec_*). I can create a sample but I am Delphi developer and my sources will be useless for C++ developer.
    That will have everything I need

  20. #47
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    565
    Thanks
    67
    Thanked 199 Times in 147 Posts
    Not sure how the Skylake system is configured. I've only remote access.
    Included now the test on a i2600k CPU at 3,4 GHz.
    You have better scaling with small files, but the effect using more threads is diminished when the files get larger.
    Everyone can download "TurboBench" and redo the benchmarks.

    Code:
          C Size  ratio%     C MB/s     D MB/s   Name        
        48758739    23.0       2.14      76.66   lzma 9                           
        49013348    23.1       3.69      77.63   flzma2 11                      
        49021875    23.1       5.94      77.62   flzma2 11mt2              
        49030944    23.1      12.08      77.61   flzma2 11mt4             
        49034963    23.1      15.13      77.62   flzma2 11mt8           
        50861542    24.0       1.39     222.60   lzham 4                          
        50861542    24.0       3.47     232.17   lzham 4t4                        
        50861542    24.0       1.90     227.12   lzham 4t2
    Last edited by dnd; 11th March 2018 at 21:03.

  21. #48
    Member
    Join Date
    Feb 2015
    Location
    Australia
    Posts
    75
    Thanks
    13
    Thanked 67 Times in 25 Posts
    Results from turbobench, gcc 7.2.0, i5 2500 at 3.3 GHz. Your results on the 2600k suggest another process was loading the CPU a bit. That would impact mt4 results more than the others.

          C Size  ratio%     C MB/s     D MB/s   Name
    48761289 23.0 2.06 78.76 lzma 9
    48756483 23.0 3.17 79.05 lzma 9:mt2
    49001957 23.1 3.59 79.73 flzma2 11
    49003096 23.1 6.46 79.66 flzma2 11:mt2
    49008164 23.1 11.07 80.10 flzma2 11:mt4

  22. #49
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    565
    Thanks
    67
    Thanked 199 Times in 147 Posts
    I've made the test for 4 and 8 threads again.
    I'm becoming indeed better numbers for 4/8 threads. 12.08 instead of 9.87 MB/s for 4 threads. Lzham numbers for 4 threads also better.
    On my system, the timings are not stable as with single thread benchmarking.
    Note that lzma,9 with 2 threads is working only on windows.
    Very good work!

    btw. In Lzturbo, I'm using cache efficient compact tries, but not implemented multithreading.
    LzTurbo is considering all possible matches at each position.
    It is not clear which data structures are you using. Binary trees?
    Last edited by dnd; 11th March 2018 at 21:53.

  23. #50
    Member
    Join Date
    Feb 2015
    Location
    Australia
    Posts
    75
    Thanks
    13
    Thanked 67 Times in 25 Posts
    Quote Originally Posted by dnd View Post
    It is not clear which data structures are you using. Binary trees?
    The basic data structure is an array of integer pairs, one pair for each byte in the data block. Each pair consists of the index of the longest previous match in the data block, and the match length. For dictionaries up to 64Mb and a maximum match length of < 64 bytes, the pair is packed into a single 32-bit value. For cache efficiency, each chain of unique 2-byte or 4-byte matches is copied into a buffer, resolved up to the maximum match length, and copied back.

    The basic algorithm without buffering is in the functions RadixInitReference and RecurseListsReference in radix_engine.h

  24. Thanks:

    dnd (12th March 2018)

  25. #51
    Member
    Join Date
    May 2015
    Location
    ~
    Posts
    11
    Thanks
    2
    Thanked 5 Times in 2 Posts
    Thank you, Conor!
    FLZMA2 is a substantial improvement over vanilla LZMA2

    I have two questions for you if you'd be so kind as to answer:

    1) Why is radyx so much faster than 7z-FLZMA2?
    2) Do you plan on having a look at extraction speed? Archives generated by vanilla LZMA2 seem to extract quite a bit faster with 7z 18.xx' optimisations.

    Code:
    Archiver           | File Size | Elapsed | per Sec | 7z x
    -------------------+-----------+---------+---------+------
    7za a -mx6 -mmt4   | 137967163 |   33.8s |    3.89 | 5.1s
    7zfl2 a -mx7 -mmt4 | 137746718 |   26.3s |    4.99 | 5.6s
    radyx a -mx7 -mmt4 | 138192220 |   17.4s |    7.57 | 5.4s

  26. #52
    Member
    Join Date
    Feb 2015
    Location
    Australia
    Posts
    75
    Thanks
    13
    Thanked 67 Times in 25 Posts
    Quote Originally Posted by choochootrain View Post
    1) Why is radyx so much faster than 7z-FLZMA2?
    2) Do you plan on having a look at extraction speed? Archives generated by vanilla LZMA2 seem to extract quite a bit faster with 7z 18.xx' optimisations.
    This is a nice example of why in-memory benchmarks are the most accurate. Much time is spent reading your input from disk. Radyx uses an extra buffer to read during compression, but 7z does not, or at least not a large buffer. Radyx uses the same compression code as the DLL in 7z-FL2 so it runs at about the same speed.

    7-Zip's own decoder is used for all LZMA2 decoding in 7z-FL2 (the FL2 DLL lacks the optimized decoder but the latest source has it). The speed difference occurs because 7-Zip does a dictionary reset part way through the compressed stream, so an extra decoder thread can start decoding there in parallel. FL2 never resets the dictionary so decoding is single-threaded. I will modify it to add dictionary resets. This decreases compression ratio a little but I'll see how it turns out.

  27. Thanks:

    choochootrain (13th March 2018)

  28. #53
    Member
    Join Date
    Aug 2016
    Location
    Russia
    Posts
    108
    Thanks
    6
    Thanked 70 Times in 37 Posts
    Maybe it will be interesting for someone. I created separate FLZMA2 plugin that can be used with baseline 7-Zip.

  29. Thanks (2):

    load (30th April 2018),Piglet (30th April 2018)

  30. #54
    Member
    Join Date
    Sep 2011
    Location
    uk
    Posts
    239
    Thanks
    192
    Thanked 17 Times in 12 Posts
    • 7-Zip 18.05 was released.
      7-Zip for 32-bit Windows:
      http://7-zip.org/a/7z1805.exe
      or
      http://7-zip.org/a/7z1805.msi
      7-Zip for 64-bit Windows x64:
      http://7-zip.org/a/7z1805-x64.exe
      or
      http://7-zip.org/a/7z1805-x64.msi
      What's new after 7-Zip 18.01:

      • The speed for single-thread LZMA/LZMA2 decoding
        was increased by 30% in x64 version and by 3% in x86 version.
      • 7-Zip now can use multi-threading for 7z/LZMA2 decoding,
        if there are multiple independent data chunks in LZMA2 stream.
      • 7-Zip now can use multi-threading for xz decoding,
        if there are multiple blocks in xz stream.
      • The speed for LZMA/LZMA2 compressing was increased
        by 8% for fastest/fast compression levels and
        by 3% for normal/maximum compression levels.
      • 7-Zip now shows Properties (Info) window and CRC/SHA results window
        as "list view" window instead of "message box" window.
      • Some improvements in zip, hfs and dmg code.
      • Previous versions of 7-Zip could work incorrectly in "Large memory pages" mode in
        Windows 10 because of some BUG with "Large Pages" in Windows 10.
        Now 7-Zip doesn't use "Large Pages" on Windows 10 up to revision 1709 (16299).
      • The vulnerability in RAR unpacking code was fixed (CVE-2018-10115).
      • Some bugs were fixed.
      • New localization: Kabyle.





















  31. Thanks:

    Conor (3rd May 2018)

  32. #55
    Member
    Join Date
    Feb 2015
    Location
    Australia
    Posts
    75
    Thanks
    13
    Thanked 67 Times in 25 Posts
    Thanks, I'll update the repo soon.

  33. Thanks:

    load (3rd May 2018)

Page 2 of 2 FirstFirst 12

Similar Threads

  1. SLZ - stateless zip - fast zlib-compatible compressor
    By willy in forum Data Compression
    Replies: 50
    Last Post: 9th June 2020, 11:30
  2. Fast LZMA2 library with radix matchfinder
    By Conor in forum Data Compression
    Replies: 31
    Last Post: 23rd October 2019, 19:08
  3. Give me newest Win64 lz4.exe & lz5.exe binaries
    By lz77 in forum Data Compression
    Replies: 3
    Last Post: 21st July 2017, 11:47
  4. lzma2 stream detector
    By Shelwien in forum Data Compression
    Replies: 0
    Last Post: 4th June 2016, 20:28
  5. Compiling Fastest Binaries
    By comp1 in forum The Off-Topic Lounge
    Replies: 8
    Last Post: 3rd April 2016, 19:54

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •