Page 1 of 2 12 LastLast
Results 1 to 30 of 32

Thread: Fast LZMA2 library with radix matchfinder

  1. #1
    Member
    Join Date
    Feb 2015
    Location
    Australia
    Posts
    75
    Thanks
    13
    Thanked 67 Times in 25 Posts

    Fast LZMA2 library with radix matchfinder

    I've recently done more work on the buffered radix matchfinder I used in the experimental Radyx archiver (https://encode.su/threads/2134-Radyx-archiver). It's simpler and faster than the one in Radyx, and I doubt there's much more scope for improvement. The library is written in C. I used some performance tricks and portability code from Zstandard.

    I posted a 2-thread Silesia comparison graph in the github project: https://github.com/conor42/fast-lzma2
    The performance gap over 7-zip is even higher with 4 threads than 2.

    It's work in progress and there's no DLL build yet, only a fuzzer and a benchmark. I've made a 7-zip fork which uses the library by default, which I'll upload with binaries once I sort out building all the components.

    Regards,
    Conor

  2. Thanks (9):

    Bulat Ziganshin (14th January 2018),Cyan (15th January 2018),encode (16th January 2018),Gonzalo (14th January 2018),jibz (15th January 2018),khavish (14th January 2018),load (14th January 2018),Stephan Busch (15th January 2018),xcrh (16th January 2018)

  3. #2
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    536
    Thanks
    236
    Thanked 90 Times in 70 Posts
    It would be really nice to give it a try. I really loved the old Radyx. Unfortunately, I don't have a Windows box around, but I could test under WINE if someone were to provide a binary. Thanks in advance to anyone who help with this.

  4. #3
    Member
    Join Date
    Feb 2017
    Location
    none
    Posts
    25
    Thanks
    6
    Thanked 13 Times in 6 Posts
    I compiled the code with visual studio 2015 i got a .exe (bench.exe) and one dll (libflzma2.dll) don't know how to use
    I attach de binaries
    Attached Files Attached Files

  5. Thanks:

    Gonzalo (15th January 2018)

  6. #4
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    827
    Thanks
    236
    Thanked 297 Times in 178 Posts
    Quote Originally Posted by Conor View Post
    I've recently done more work on the buffered radix matchfinder ...
    Would it make sense to implement it within brotli or zstandard?

  7. #5
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    536
    Thanks
    236
    Thanked 90 Times in 70 Posts
    Quote Originally Posted by redrabbit View Post
    don't know how to use
    It is not intended to use as a file compressor / archiver, instead it performs a benchmark using the file you provide as argument.

    Code:
    bench.exe file.bin

  8. #6
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    536
    Thanks
    236
    Thanked 90 Times in 70 Posts
    @Conor: I get an error on several files. The only thing I found in common between them is that they are already compressed (pdfs, mp3, xz, etc). They are of different sizes but there is always the same error:
    Code:
    FL2_decompressDCtx() failed on size 4294967286 : Corrupted block detected

  9. #7
    Member
    Join Date
    Feb 2015
    Location
    Australia
    Posts
    75
    Thanks
    13
    Thanked 67 Times in 25 Posts
    It actually failed on compression but wasn't checked. The benchmark code is a little rougher than the rest. I halved the output buffer size to save memory, but never changed it back, so it's too small for incompressible data.
    Fixed now. Thanks for finding it.

    The 7-zip mod is nearly done so I'll upload binaries here and people can try archiving with it.

  10. Thanks:

    Stephan Busch (15th January 2018)

  11. #8
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,562
    Thanks
    772
    Thanked 687 Times in 372 Posts
    Quote Originally Posted by Jyrki Alakuijala View Post
    Would it make sense to implement it within brotli or zstandard?
    i would say that it makes more sense to implement it with codecs faster than lzma because
    1) it losts some compression due to its block-wise nature
    2) sorting match finders have potential up to multi-GB/s processing speed

  12. #9
    Member
    Join Date
    Feb 2015
    Location
    Australia
    Posts
    75
    Thanks
    13
    Thanked 67 Times in 25 Posts
    Quote Originally Posted by Jyrki Alakuijala View Post
    Would it make sense to implement it within brotli or zstandard?
    I combined it with a zstd encoder a while back, and it works well for level 12 or 13 up. The problem with faster levels is they skip many positions without searching for matches, so an algorithm which searches all positions can't compete. It makes optimization cheaper so levels 4-12 of my lzma2 library use optimal parsing.

    Integration with zstd is messy. The threading model is different and so is the interaction between the matchfinder and encoder.

  13. #10
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    In case anyone wants to try this out on Linux (or anything other than VS): https://github.com/conor42/fast-lzma2/pull/1 (includes rudimentary CMake support).

  14. Thanks:

    Conor (16th January 2018)

  15. #11
    Member
    Join Date
    Feb 2015
    Location
    Australia
    Posts
    75
    Thanks
    13
    Thanked 67 Times in 25 Posts
    Quote Originally Posted by nemequ View Post
    In case anyone wants to try this out on Linux (or anything other than VS): https://github.com/conor42/fast-lzma2/pull/1 (includes rudimentary CMake support).
    Merged. Thanks

  16. #12
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    Is this compatible with existing lzma2 decoders? AFAICT it isn't, but the name makes it sound like it is… If it isn't you should probably make that clear in the README and project summary on github, and/or consider a different name.

  17. #13
    Member
    Join Date
    Feb 2015
    Location
    Australia
    Posts
    75
    Thanks
    13
    Thanked 67 Times in 25 Posts
    Quote Originally Posted by nemequ View Post
    Is this compatible with existing lzma2 decoders? AFAICT it isn't, but the name makes it sound like it is… If it isn't you should probably make that clear in the README and project summary on github, and/or consider a different name.
    Yes but you need to drop the dict size property byte off the front of the stream and feed the rest into the decoder, or you can disable writing the prop byte. It's done in my 7-zip fork which uses the library.

  18. #14
    Member
    Join Date
    Feb 2015
    Location
    Australia
    Posts
    75
    Thanks
    13
    Thanked 67 Times in 25 Posts
    Heaps of testing done and DLLs for x86 and x64 are now beta-released on github: https://github.com/conor42/fast-lzma2/releases

    7-Zip 18.01 fork is released also: https://github.com/conor42/7-Zip-FL2/releases
    I found a way to add it as a new matchfinder by modifying the standard Lzma2Encoder object to redirect compression to the new library if HC or BT matchfinders are not specified. RMF is the default in this release.
    Last edited by Conor; 28th February 2018 at 07:05.

  19. Thanks (2):

    Bulat Ziganshin (30th January 2018),Zonder (5th February 2018)

  20. #15
    Member
    Join Date
    Feb 2015
    Location
    Australia
    Posts
    75
    Thanks
    13
    Thanked 67 Times in 25 Posts
    After a great deal of work to get the library into a mature state, I have released v1.0.0. It includes some optimizations, including some of Igor Pavlov's recent work. The speed is up by around 5% - 8%. I did a massive amount of fuzz testing, and also archiving random file selections both in Radyx and 7-Zip. I think my computer is nearly worn out

    I've released only the command line tool from 7-Zip-FL2 19.00 with Fast LZMA2 as the default. All of the other 7-Zip components will be in the next 7-Zip-zstd release if Tino is happy with it.
    Edit: there's a 7-Zip plugin too.

    Finally, the XZ fork FXZ is working well. Reports of success building/testing it on non-x86 systems are most welcome
    Last edited by Conor; 29th March 2019 at 17:51.

  21. Thanks (5):

    Bulat Ziganshin (29th March 2019),Cyan (29th March 2019),Mike (29th March 2019),milky (29th March 2019),Shelwien (29th March 2019)

  22. #16
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,897
    Thanks
    291
    Thanked 1,267 Times in 715 Posts
    Do you have any ideas about how to make a lzma matchfinder with explicit support for fuzzy matches?
    http://nishi.dreamhosters.com/u/lzmapat_v0.rar

  23. #17
    Member
    Join Date
    Feb 2015
    Location
    Australia
    Posts
    75
    Thanks
    13
    Thanked 67 Times in 25 Posts
    Quote Originally Posted by Shelwien View Post
    Do you have any ideas about how to make a lzma matchfinder with explicit support for fuzzy matches?
    http://nishi.dreamhosters.com/u/lzmapat_v0.rar
    I have no ideas about how to make it work well.

  24. #18
    Member
    Join Date
    Feb 2015
    Location
    Australia
    Posts
    75
    Thanks
    13
    Thanked 67 Times in 25 Posts
    Quote Originally Posted by Shelwien View Post
    Do you have any ideas about how to make a lzma matchfinder with explicit support for fuzzy matches?
    http://nishi.dreamhosters.com/u/lzmapat_v0.rar
    Also, the LZMA optimizer has a related idea where it finds a combination of match, literal, rep match. The three together constitute a match containing 1 non-matching byte. The basic idea of rep matches can be seen as a workaround for doing fuzzy matching.

  25. #19
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,897
    Thanks
    291
    Thanked 1,267 Times in 715 Posts
    Yes, that's why I asked. But lzma matchfinder only provides one match of each length, and rep-matches are later added during heuristic pattern check.
    I'm pretty sure that explicit fuzzy matchfinder can find better matches and improve compression even with current lzma format.
    But its also interesting to investigate different methods of fuzzy match coding.
    For example, instead of current match-literal-rep_match patterns, we could encode one long match, then some patch codes.
    Or there's the example of bsdiff, which actually also works with fuzzy matches, and handles them by separating literal and match streams -
    LZ part of bsdiff actually never changes the length of literal stream, but subtracts the matches from it, so some parts turn to zeroes,
    and some other parts (mostly offsets in executable code) turn to repeated deltas. Then later its processed with something like RLE to
    remove zeroes, but that's less relevant.

    Btw, with lzma2, there's another obvious option for compression improvement - lzma2 block headers can change lc/lp/pb options.
    Somehow I've never seen an encoder which could optimize these, though.

  26. #20
    Member
    Join Date
    Feb 2015
    Location
    Australia
    Posts
    75
    Thanks
    13
    Thanked 67 Times in 25 Posts
    Yes the current lzma only loses a small amount if the match-lit-rep pattern is removed, probably because the matchfinder is not designed to search for it.

    I've thought about optimizing lc/lp/pb for extreme compression. It would be best, but slow, to test all combinations. Maybe a statistical analysis could find the best numbers in advance.

  27. #21
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,897
    Thanks
    291
    Thanked 1,267 Times in 715 Posts
    > Yes the current lzma only loses a small amount if the match-lit-rep pattern is removed,
    > probably because the matchfinder is not designed to search for it.

    I actually tested that: https://encode.su/threads/1288-LZMA-...ll=1#post25165
    Apparently some data types rely on rep0 (=fuzzy matches) quite a lot even as is.

    Btw, I found that unpacking the rep-tokens, then using a simple greedy parser to recreate them
    actually improves compression a little.

    > I've thought about optimizing lc/lp/pb for extreme compression. It would be best, but slow, to test all combinations.
    > Maybe a statistical analysis could find the best numbers in advance.

    No need to test all combinations.
    Either test a few popular ones (8/0/0 for text, 0/2/2 for 32-bit data, 3/0/2 the default, maybe a few more).
    Or test options separately (find best lp, then best lc, then pb up to lp).

  28. #22
    Member
    Join Date
    Feb 2015
    Location
    Australia
    Posts
    75
    Thanks
    13
    Thanked 67 Times in 25 Posts
    Version 1.0.1 release: https://github.com/conor42/fast-lzma2/releases

    This is mostly to make a proper POSIX build system. The root makefile builds a shared library, which can be installed with `make install`. There will be no 7-Zip or fxz releases with this version.

  29. Thanks:

    joerg (6th May 2019)

  30. #23
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    412
    Thanks
    38
    Thanked 64 Times in 38 Posts
    @Conor

    i am now using https://github.com/conor42/7-Zip-FL2/releases
    it performs very well .. faster

    will you release a new version of radyx ?

    https://github.com/conor42/Radyx/releases

    Thank you very much for your wonderfull programs!

    best regards

  31. #24
    Member
    Join Date
    Feb 2015
    Location
    Australia
    Posts
    75
    Thanks
    13
    Thanked 67 Times in 25 Posts
    Quote Originally Posted by joerg View Post
    will you release a new version of radyx ?

    https://github.com/conor42/Radyx/releases
    I will at some point. When I get to it depends on how much work I have to do over the coming months.

  32. #25
    Member
    Join Date
    Mar 2016
    Location
    Croatia
    Posts
    189
    Thanks
    81
    Thanked 13 Times in 12 Posts
    is it possible to use this with normal 7-zip file manager on windows?...i would like to test it.

  33. #26
    Member
    Join Date
    Feb 2015
    Location
    Australia
    Posts
    75
    Thanks
    13
    Thanked 67 Times in 25 Posts
    Quote Originally Posted by dado023 View Post
    is it possible to use this with normal 7-zip file manager on windows?...i would like to test it.
    Yes, you can select it in the file manager compress dialog in 7-Zip-zstd: https://github.com/mcmilk/7-Zip-zstd/releases

  34. Thanks:

    dado023 (16th May 2019)

  35. #27
    Member
    Join Date
    Apr 2017
    Location
    Bangladesh
    Posts
    13
    Thanks
    57
    Thanked 2 Times in 2 Posts
    Can you provide a pascal example for these dlls? I am not familiar with C++

  36. #28
    Member
    Join Date
    Feb 2015
    Location
    Australia
    Posts
    75
    Thanks
    13
    Thanked 67 Times in 25 Posts
    Quote Originally Posted by 78372 View Post
    Can you provide a pascal example for these dlls? I am not familiar with C++
    I haven't used pascal for a long time so there are no pascal examples, but if you look in the dev branch, test/file_test.c has a C example which is simpler than the C++ in 7-Zip-zstd.

  37. Thanks:

    78372 (13th August 2019)

  38. #29
    Member
    Join Date
    Feb 2015
    Location
    Australia
    Posts
    75
    Thanks
    13
    Thanked 67 Times in 25 Posts
    Released fxz 1.1.0alpha, including an Arch PKGBUILD. I think fxz is complete now if no issues emerge.

  39. Thanks:

    Gonzalo (22nd October 2019)

  40. #30
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    536
    Thanks
    236
    Thanked 90 Times in 70 Posts
    Lil bug report... Very unable to build. System: Manjaro (up to date). Method: PKGBUILD


    --------------------------

    EDIT: Manual compilation OK. Testing now. Anything we should know about testing? For example, max RAM usage for each preset, or if this is compatible with all the 7-zip flzma2 options
    Attached Files Attached Files
    Last edited by Gonzalo; 23rd October 2019 at 02:43.

Page 1 of 2 12 LastLast

Similar Threads

  1. TarsaMatchFinder - complete match finder based on radix sort
    By Piotr Tarsa in forum Data Compression
    Replies: 14
    Last Post: 29th March 2017, 00:18
  2. lzma2 stream detector
    By Shelwien in forum Data Compression
    Replies: 0
    Last Post: 4th June 2016, 19:28
  3. Radix sort match finder
    By Conor in forum Data Compression
    Replies: 17
    Last Post: 13th February 2015, 13:19
  4. lzma's binary tree matchfinder
    By Shelwien in forum Data Compression
    Replies: 7
    Last Post: 13th October 2011, 08:38
  5. Nibble MMC matchfinder
    By Shelwien in forum Data Compression
    Replies: 4
    Last Post: 25th April 2011, 12:21

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •