Page 1 of 2 12 LastLast
Results 1 to 30 of 40

Thread: LZF - Optimized LZF compressor

  1. #1
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,004
    Thanks
    393
    Thanked 389 Times in 149 Posts

    Smile LZF - Optimized LZF compressor

    A quick VS2013 compile is here:
    http://compressme.net/

    LTCB results:
    Code:
    Z:\>lzf c enwik8 enwik8.z
    Compressing enwik8: 100000000->48947532 in 0.924s
    
    Z:\>lzf cx enwik8 enwik8.z
    Compressing enwik8: 100000000->46318130 in 1.611s
    
    Z:\>lzf d enwik8.z e8
    Decompressing enwik8.z: 46318130->100000000 in 0.244s
    
    Z:\>lzf c enwik9 enwik9.z
    Compressing enwik9: 1000000000->440862551 in 8.503s
    
    Z:\>lzf cx enwik9 enwik9.z
    Compressing enwik9: 1000000000->416377741 in 14.477s
    
    Z:\>lzf d enwik9.z e9
    Decompressing enwik9.z: 416377741->1000000000 in 2.271s
    It is compatible with LibLZF compression library, but not with an original LZF compressor (my LZF produces raw compression streams with no headers and/or checksums).
    Also, keep in mind, my LZF is not based or delivered from this library - I've tested many (well, nearly all) LZ77/LZSS ideas and designed this compression format independently, after, comparing it to existing compressors I've found that it is similar to the LZF, so I named it accordingly. Actually, it utilizes some ideas from LZCB (by Charles Bloom) and also it is somewhat similar to LZOP. Thus, the original LZF carry no new data compression ideas or tricks.
    All in all, it is a very fast byte-aligned LZ77 with an 8 KB window!

    Well, enjoy new release!


  2. Thanks (6):

    avitar (29th October 2013),GOZARCK (29th October 2013),Matt Mahoney (29th October 2013),Nania Francesco (30th October 2013),nburns (29th October 2013),Paul W. (31st October 2013)

  3. #2
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 795 Times in 488 Posts
    Updated LTCB. I used your timing results and it looks like it made the Pareto frontier. Same hardware? http://mattmahoney.net/dc/text.html#4164

  4. #3
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,004
    Thanks
    393
    Thanked 389 Times in 149 Posts
    Yep, it's my 3770K PC

  5. #4
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 72 Times in 56 Posts
    Wait till Matt tests it on his really slow computer. Then it won't look as hot.

  6. #5
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,004
    Thanks
    393
    Thanked 389 Times in 149 Posts
    Well, actually, the goal of my LZF is produce the smallest possible output with such a simplest compression format. Much like optimized deflate tools.
    And, it can be faster for sure. Currently it's just not that super-optimized Visual Studio compile. Intel C++ compile might be faster a bit. In addition, since the decoder is very simple (simplest) it can be implemented in ASM in 1 minute. Please read the original LZF format description.

  7. #6
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,004
    Thanks
    393
    Thanked 389 Times in 149 Posts
    Haste makes a waste - a quick performance fix / recompile - LZF v1.01 is here:
    http://compressme.net/

    + Optimized compression routines
    + Recompiled using VS2013 Express (vs VS2013 RC). Tested the latest Intel C++ and VS2010 - VS2013 compiled a faster code! Now it's my favorite!

    New ENWIK9 timings:
    Code:
    Z:\>lzf c enwik9 enwik9.z
    Compressing enwik9: 1000000000->440862551 in 7.971s
    
    Z:\>lzf cx enwik9 enwik9.z
    Compressing enwik9: 1000000000->416377741 in 12.464s
    
    Z:\>lzf d enwik9.z e9
    Decompressing enwik9.z: 416377741->1000000000 in 2.262s

  8. #7
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 72 Times in 56 Posts
    Quote Originally Posted by encode View Post
    + Optimized compression routines
    + Recompiled using VS2013 Express (vs VS2013 RC). Tested the latest Intel C++ and VS2010 - VS2013 compiled a faster code! Now it's my favorite!
    You're not giving away the source code? Are you sure you didn't miss any options on the Intel compiler that might have helped (or use any that might have hurt)? Since you're comparing compilers, it would be interesting to see how gcc and clang do.

  9. #8
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,004
    Thanks
    393
    Thanked 389 Times in 149 Posts
    Intel C++ is not always faster. Visual C++ 2013 is goddamn good! Actually, I briefly tested ICL with switches I usually use, and in many cases Intel is faster indeed! However, in some cases (LZF, PPMX) it is not.
    LZF source is not compatible with GCC - I intensively use Microsoft Visual C++ specific stuff and happy with it.
    As to Open Source - I release source of a program only if it's obsolete in terms of novel compression ideas and/or implementation tricks. New LZF is a good base for any LZ77 (LZH, LZ77+ARI) compressor and it is a step forward from e.g. CRUSH internals. There's no such thing as a free lunch! This is my policy.

  10. Thanks:

    avitar (30th October 2013)

  11. #9
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 72 Times in 56 Posts
    Quote Originally Posted by encode View Post
    Intel C++ is not always faster. Visual C++ 2013 is goddamn good! Actually, I briefly tested ICL with switches I usually use, and in many cases Intel is faster indeed! However, in some cases (LZF, PPMX) it is not.
    LZF source is not compatible with GCC - I intensively use Microsoft Visual C++ specific stuff and happy with it.
    I'll do you a favor and translate it into straight C if you want.



    C++ is a mixed blessing, because it tends to make your code dependent on specific compilers. Nothing C++ does for you is any help in writing a tightly-optimized routine, because you can't afford to blindly trust the compiler and libraries to do things for you, anyway, right? The only thing it has over C that might tend, on balance, to help make code more performant is templates, but all they do is copy and paste, nothing you couldn't do without.

    Edit: Don't take this too critically -- you obviously get superb code out of VC++, and I'm interested to know how. I know Microsoft treats their C compiler with a bit of neglect, and writing code that extracts every last drop of performance from the CPU isn't a subject I'm all that well versed in. Portability is worth aiming for, though, and -- all else being equal -- I would expect that C can be optimized just as effectively as C++.
    Last edited by nburns; 30th October 2013 at 18:46.

  12. Thanks:

    avitar (30th October 2013)

  13. #10
    Member
    Join Date
    Nov 2012
    Location
    Bangalore
    Posts
    114
    Thanks
    9
    Thanked 46 Times in 31 Posts
    Quote Originally Posted by nburns View Post
    ... Portability is worth aiming for, though, and -- all else being equal -- I would expect that C can be optimized just as effectively as C++.
    IMHO, for low level optimized code, C++ is just syntactic sugar compared to C. However C++ actually eases development of large complex applications.

  14. #11
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 795 Times in 488 Posts
    Updated results on my slow computer http://mattmahoney.net/dc/text.html#4164

  15. #12
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 72 Times in 56 Posts
    Quote Originally Posted by Matt Mahoney View Post
    Updated results on my slow computer http://mattmahoney.net/dc/text.html#4164
    My impression is that you probably have a few computers, but one is really old and slow. That would be the one you used to benchmark my BWTS code.

  16. #13
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 72 Times in 56 Posts
    Quote Originally Posted by moinakg View Post
    IMHO, for low level optimized code, C++ is just syntactic sugar compared to C. Howevler C++ actually eases development of large complex applications.
    It is almost entirely syntactic sugar on top of C. Before it had its own compilers, it had a program that just expanded the sugar back to C and you compiled the C. Even now they share the same compiler.

    Nothing stops you from writing parts of your app in C and the rest in C++. Nothing, except shaming from C++ zealots.

  17. #14
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 795 Times in 488 Posts
    Mostly I use one computer for Windows and a slightly newer one for Linux. Well, I don't see any good reason to get a new one as long as the old ones still work, and new ones for the same price aren't much faster. It seems like Moore's Law has stalled in the last few years, or maybe it has just shifted to phones and tablets.

    My original LTCB results were on an even older computer. I am kind of encouraging people to buy fast computers and run their own tests.

  18. #15
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 72 Times in 56 Posts
    Quote Originally Posted by Matt Mahoney View Post
    Mostly I use one computer for Windows and a slightly newer one for Linux. Well, I don't see any good reason to get a new one as long as the old ones still work, and new ones for the same price aren't much faster. It seems like Moore's Law has stalled in the last few years, or maybe it has just shifted to phones and tablets.

    My original LTCB results were on an even older computer. I am kind of encouraging people to buy fast computers and run their own tests.
    I haven't followed CPUs much lately, but if Moore's law stopped, it's news to me. The problem is that Moore's law is only incidentally related to performance. I'm going out on a limb here a bit with my knowledge of electronics, but I think what physically mattered to performance was the amount of work needed to charge up the chip each clock cycle, which I believe corresponds to this equation I stole from wikipedia:

    Click image for larger version. 

Name:	9ba73e70f265987f0f33c1b5cc10d119.png 
Views:	295 
Size:	675 Bytes 
ID:	2538

    For a while, Moore's law helped by shrinking the amount of charge needed (q), but, eventually, as the wires got too small, the bottom dropped out on the ability to hold the charge (capacitance, C). So the amount of work went down for a while and then hit bottom, which was probably predictable from elementary principles all along. We just didn't know when it would happen. It turned out to be around 2006.

    Every performance increase since has been about rearranging transistors to find the design that executes the most instructions per clock, but that is heading for some constant limit. Rearranging transistors gives diminishing returns.

    That's the grim truth as I see it.
    Last edited by nburns; 31st October 2013 at 09:45.

  19. #16
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 795 Times in 488 Posts
    Clock speeds stalled several years ago. The next barrier is transistor size. Now we have 22 nm features, which is about 100 atoms. You can't make components smaller than atoms.

  20. #17
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 72 Times in 56 Posts
    Quote Originally Posted by Matt Mahoney View Post
    The next barrier is transistor size.
    Transistor size isn't directly related to processing speed, though. Smaller ones helped only until we hit a minimum in the energy curve. Small helps give you more of them for the same cost, but you still have to conttain the power, and you have to find something meaningful to do with them. The variable most closely dependent on size is cost, because more chips fit on a wafer.

    The width of a feature shrinks by the square root of 2 each generation, so you can calculate the ultimate limit on Moore's law. If I did the math right, that gives us 20 years until we hit an atom.
    Last edited by nburns; 1st November 2013 at 02:33.

  21. #18
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,004
    Thanks
    393
    Thanked 389 Times in 149 Posts
    g++ compile (MinGW g++ (GCC) 4.8.1)
    Code:
    Z:\>lzf cx enwik9 enwik9.z
    Compressing ENWIK9: 1000000000->416377741 in 13.815s
    
    Z:\>lzf d enwik9.z e9
    Decompressing enwik9.z: 416377741->1000000000 in 2.68s
    i.e. - it is slower...

  22. #19
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 72 Times in 56 Posts
    Quote Originally Posted by encode View Post
    g++ compile (MinGW g++ (GCC) 4.8.1)
    Code:
    Z:\>lzf cx enwik9 enwik9.z
    Compressing ENWIK9: 1000000000->416377741 in 13.815s
    
    Z:\>lzf d enwik9.z e9
    Decompressing enwik9.z: 416377741->1000000000 in 2.68s
    i.e. - it is slower...
    So it's about 11% slower for compression and about 19% slower for decompression? It's a step closer to being portable, though, and you can't argue with that.

  23. #20
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,004
    Thanks
    393
    Thanked 389 Times in 149 Posts
    Yep, but I removed Unicode and Large File support for that...

  24. #21
    Member Skymmer's Avatar
    Join Date
    Mar 2009
    Location
    Russia
    Posts
    688
    Thanks
    41
    Thanked 173 Times in 88 Posts
    Quote Originally Posted by encode View Post
    Intel C++ is not always faster. Visual C++ 2013 is goddamn good!
    Sad but true - Intel Compiler is not a cake anymore. New versions have a tendency to generate a slower code than previous ones in some cases. And both GCC and MSVC quite often beat the IC. For example IC versions of reflate's dlls are notably slower then MSVC. And SREP GCC versions are faster (at least on some modes) than IC. Probably this was the reason Bulat made GCC compiles to be default.
    From another point of view there is PGO in IC. Sometimes it can give a realy good boost.
    Anyway, I talked with Shelwien months ago about the IntelC and by his opinion the best version (in terms of speed) is 11.1 so if you're going to squeez out the maximum speed then this version is probably worth trying

  25. #22
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 72 Times in 56 Posts
    Quote Originally Posted by encode View Post
    Yep, but I removed Unicode and Large File support for that...
    Why do you need unicode support?

  26. #23
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,004
    Thanks
    393
    Thanked 389 Times in 149 Posts
    Unicode is a must have these days! As example, LZF can handle Russian, Chinese, etc. etc. file names. Why not?

  27. #24
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 72 Times in 56 Posts
    Quote Originally Posted by encode View Post
    Unicode is a must have these days! As example, LZF can handle Russian, Chinese, etc. etc. file names. Why not?
    Ok. It makes sense in the file name. It wouldn't make sense inside the file. I'm not familiar with how windows file names work. Are they utf-16 like everything else? Of course, there must be portable ways to handle that.

  28. #25
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,004
    Thanks
    393
    Thanked 389 Times in 149 Posts
    To sum up, here's the comparison of byte-aligned compressors I've released so far.

    Test file - bf4.exe - Battlefield 4 game 64-bit executable (37,079,552 bytes)

    No code has to be inserted here.

    Compressors marked with * are not byte-aligned. Yep, LZOP uses different algorithms - not just one.

    And yes, 8 KB window is too small for real world.


  29. #26
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,004
    Thanks
    393
    Thanked 389 Times in 149 Posts
    If get rid of slow getc()/putc(), changing interface from buf->file, file->buf to buf->buf, we will get:
    Code:
    Z:\>lzf c enwik9 enwik9.z
    Compressing enwik9: 1000000000->440862791 in 6.926s
    
    Z:\>lzf cx enwik9 enwik9.z
    Compressing enwik9: 1000000000->416377981 in 11.497s
    
    Z:\>lzf d enwik9.z e9
    Decompressing enwik9.z: 416377981->1000000000 in 1.825s
    Size differs, since here we need to store 32-bit compressed size per each block compressed


  30. #27
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 72 Times in 56 Posts
    What is the penalty associated with getc/putc? stdio is supposed to buffer for you, but it doesn't seem to quite replace manual buffering. There isn't much documentation on how stdio buffers that I can find.

  31. #28
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,004
    Thanks
    393
    Thanked 389 Times in 149 Posts
    It's about "if" statement per each byte read/write. Writing own getc()/putc() help just a little. setvbuf() might slightly improve performance as well.
    In other words, if we deal with an algorithm that is about the same complexity as memcpy(), everything counts (even in small amounts )

  32. #29
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 72 Times in 56 Posts
    Quote Originally Posted by encode View Post
    It's about "if" statement per each byte read/write. Writing own getc()/putc() help just a little. setvbuf() might slightly improve performance as well.
    In other words, if we deal with an algorithm that is about the same complexity as memcpy(), everything counts (even in small amounts )
    Okay. So the penalty is very small, then.

  33. #30
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,004
    Thanks
    393
    Thanked 389 Times in 149 Posts
    Well, just added a new super cruncher mode to the upcoming LZF v1.02 with the insane string parser. It's okay, since the purpose of the Maximum mode is the maximum compression within a given format, at any cost. Normal mode will have about the same compression and a slightly higher compression speed. New version will be fully compatible with the older one.

    Testing results:

    No code has to be inserted here.


Page 1 of 2 12 LastLast

Similar Threads

  1. pbzip2 1.05 optimized compile
    By M4ST3R in forum Download Area
    Replies: 0
    Last Post: 2nd October 2009, 16:21
  2. bzip2 1.05 optimized compile
    By M4ST3R in forum Download Area
    Replies: 0
    Last Post: 21st September 2009, 19:49
  3. balz 1.15 optimized compile
    By M4ST3R in forum Download Area
    Replies: 0
    Last Post: 20th July 2009, 22:36
  4. lzop optimized compile
    By M4ST3R in forum Download Area
    Replies: 1
    Last Post: 30th June 2009, 21:31
  5. 7zip >> Sfx optimized - 23,7 kb
    By Yuri Grille. in forum Data Compression
    Replies: 22
    Last Post: 12th April 2009, 21:33

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •