Page 2 of 8 FirstFirst 1234 ... LastLast
Results 31 to 60 of 239

Thread: Ultra-fast LZ

  1. #31
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by Shelwien View Post
    Damn.

    Suppose that we have some simplified LZ code:

    Code:
    struct {
      word offset;
      byte length;
      byte literal;
    } lzdata[n_lzrecords];
    Then, there's a common way to decode it, like
    most LZ decoders work - parsing and interpreting
    the "instructions" at once:

    Code:
    byte* p = output_buffer;
    for( i=0; i<n_lzrecords; i++ ) {
      p += memcpy( p, p-lzdata[i].offset, lzdata[i].length );
      *p++ = lzdata[i].literal;
    }
    But it can be also implemented in a different way,
    with 2 passes - parsing/compiling and actual execution:

    Code:
    // code generation
    byte* q = code_buffer;
    *q++ = 0x97; // xchg edi,eax
    *q++ = 0x33; *q++ = 0xC9; // xor ecx,ecx
    for( i=0; i<n_lzrecords; i++ ) {
      // lea esi,[edi+offs]
      *q++ = 0x8D; *q++ = 0xB7; (uint&)*q=-lzdata[i].offset;
      *q++ = 0xF3; *q++ = 0xA4; // rep movsb
      *q++ = 0xB0; *q++ = lzdata[i].literal; // mov al,literal
      *q++ = 0xAA; // stosb
    }
    *q++ = 0xC3; // ret
    // uncompressed data generation
    ((void(*)(byte*))(void*)code_buffer)( output_buffer );
    The idea is that there could be a speed improvement, because
    like this there's no inner loop in parsing, so more efficient code
    can be generated by C++ compiler for it, and also there're only
    2 data locations involved at each stage, instead of 3 in normal
    LZ decoder, so the L1 data cache can be used more efficiently.

    Also, while it may be a bit hard to observe a real speedup due
    to this trick in a single-threaded decoder (the requirements
    would be like lzdata size > 1M and uncompressed data size > 100M,
    or something like that), it certainly makes some sense on a
    multicore system, as it allows to run the parsing and uncompressed
    data generation in parallel.
    OK, got it this time. I'm not certain that it would be faster on a single threaded machine, but I love how embarrassingly parallel it is...except for code generation, which could be parallel enough with a slight modification of the LZ stream (uncompressed_size_up_to_now variable inserted once upon a time). Though then a regular C++ version could be parallel too...and I like this idea a lot. I might implement it one day.

  2. #32
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts

  3. #33
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Thanks a lot!

  4. #34
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Just designed a new LZ compressor that is aimed on a fastest decompression in pair with a high compression ratio.

    In brief, it is an LZSS with 1 MB or 2 MB dictionary and an Optimal Parsing. No Huffman or bit I/O, just the simplest byte aligned output for maximum speed. For comparison, my LZSS with optimized parsing has just 256 KB dictionary, ULZ has 512 KB dictionary but its output completely unoptimized (greedy parsing) even at max compression mode. New LZ will have more advanced output coding and more advanced optimization...

  5. #35
    Member
    Join Date
    Jun 2010
    Location
    Portugal
    Posts
    22
    Thanks
    1
    Thanked 1 Time in 1 Post
    Glad to hear it genie,
    keep on the the fantastic work.

    Alvaro

  6. #36
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Brief testing of LZSS with 1 MB dictionary on ENWIK8:

    No code has to be inserted here.

    To be honest, I'm not that impressed by the results. Encoder with 1 MB window cannot be considered as a fastest. At the same time, the compression is not that better than with Deflate. The only advantage is the world's fastest and simplest decompressor.

  7. #37
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    287
    Thanks
    9
    Thanked 33 Times in 21 Posts
    lazy matching is not optimal matching

    in optimal matching you construct a graph, thus you need to evaluate the search for the longest match at every position. then you search the graph for the shortest (least-cost) path from the beginning to the end. you can speed this up by employing the property that every possible route has to go through a literal.

    Code:
    compression of book1 (768.771 bytes)
    267.783 bytes: (lazy matching with max-dist: 4 bytes, matching took 3s)
    262.058 bytes: (optimal matching using constant costs for match/literal, matching took 5s)
    Last edited by Sebastian; 8th October 2010 at 13:43.

  8. #38
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    It's funny. For sure you're not familiar with my software and you don't know who I am.

  9. #39
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    287
    Thanks
    9
    Thanked 33 Times in 21 Posts
    On 30th June you told about LZ with optimal matching, and now you're presenting results with sub-optimal matching, which often gets named "optimal" in some way. I just thought I could be of any help, so don't mind if you think you already know everything.

  10. #40
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    The thing is, I already released LZSS with an Optimal Parsing:
    http://encode.su/threads/143-LZSS-v0.01-is-here!

    ULZ is about fast enough compression as well. That's why I decided to keep max compression mode not that complex. All in all, new ULZ should compete with LZO - beating it with more advanced parsing and/or larger window size. I'm currently searching for the most efficient byte-aligned LZ codes. The classic LZSS approach is far from the best. These days I "decrypted" the LZO source to see how it works. Nice and well known ideas can be found there.

    OK, what I want with my simplest LZ:
    • It should have a large enough window/dictionary - 64K ... 128K or even 256K
    • It should be able to compress small matches - MIN_MATCH = 3
    • No Huffman, no Bit I/O and of course no Arithmetic coder


    MIN_MATCH of 3 means that we should have small 2-byte LZ codes. Thus, match length and/or offsets should be written using a variable-length integer codes.

    With match length it's pretty easy. I may store the length in, say, 3 or 4 bits, reserving a special value that means more bits follow:
    No code has to be inserted here.
    Match offsets can be stored in different bit lengths, depending on special flags and/or match length.

    Literals are stored in 8 bits. Literal runs should be used. i.e. control byte represent how many literals should be copied.

    About flags. The classic LZSS approach is bad. (1-byte represents 8 match/literal flags). As with LZO, we should do a tricky things. Keeping one Control byte aka Marker (LZO terms) aka Tag that may represent Match length, Literal run length and may contain some bits of an LZ code...

  11. #41
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    287
    Thanks
    9
    Thanked 33 Times in 21 Posts
    never mind, i lurked the forum and saw your other programs, think you started this site .
    maybe it's possible for you to share the strongest lzss-approach you have (compression wise)?
    I would be interested in its performance.

    regards

  12. #42
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Not sure about the strongest LZSS.

    LZPM 0.16:
    http://encode.su/threads/215-lzpm-v0.16-is-here!

    Nice experimental LZ77, not LZSS though.

    Current ULZ is a nice thing, but it has just a Greedy Parsing, even at max compression level. My LZSS 0.01 has Optimal for LZSS parsing and simplest LZ-output - such output is not that good for compression.

  13. #43
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    287
    Thanks
    9
    Thanked 33 Times in 21 Posts
    Thanks for the link and it seems that you're trying to achieve different goals with every little program (good parsing is important in every LZ-approach),
    but i am nearly only interested in the maximum achieveable with a specific algorithm

    Maybe i should start a thread where others can contribute ideas? I slightly (~1.5%) beat LZPM on calgary, but comparisions make only sense if you compare related algorithms.

  14. #44
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Anyone can post ideas about LZ77 right here!

    Actually, I'm experimenting with some extra LZ ideas. Especially, LZ-coder with tricky offsets/match indexes.

  15. #45
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Quote Originally Posted by Sebastian View Post
    I slightly (~1.5%) beat LZPM on calgary, but comparisions make only sense if you compare related algorithms.
    If you compared LZPM not to your own LZ77, compare your program to PIMPLE or BCM.

  16. #46
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    287
    Thanks
    9
    Thanked 33 Times in 21 Posts
    Extra idea? If you do greedy parsing you can be sure that the literal following a match is not the symbol following the matched reference. But your entropy coding design is very limited, so this might be of little help.

    And btw. BCM is BWT, right? A comparison will not make sense for me.

  17. #47
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Extra ideas, is ideas on how to remove LZ-offset redundancy - a la LZP, ROLZ and something new that will work well with byte aligned output... Be sure that I know all the tricks you know and even more... The trick you posted I used in the LZPX back in 2005...

  18. #48
    Member
    Join Date
    May 2008
    Location
    CN
    Posts
    45
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I think pigz is the fastest

  19. #49
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Not really...

    Well, finally I've implemented something extra new! I've invented a trick, on how to build on-the-fly a large enough dictionary (> 8MB), keeping a tiny dictionary indexes (<14-bit). The disadvantage is not that deep string searching, but for the fastest coders it's supreme technique. Currently my dummy coder completely outperforms LZOP. Continue experimenting...

  20. #50
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Sounds good.

  21. #51
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    These days, I implemented all of the LZ-family coders with byte-aligned output. LZSS, ROLZ, LZP, LZRW, ... And compared them to each other. I've done something similar before, but this is the biggest test ever. And again I've found that context related stuff like LZP or ROLZ work badly with the simplest LZ output. Of course, I compared all of these, to well known coders like LZOP and current version of ULZ. Strictly speaking, for the LZ with highest compression and the simplest output aka byte-aligned output we need LZ77 and only. With deep string searching, large enough dictionary and preferably S&S parsing. But, in this case the encoder will be too slow (although decoder will be the fastest). Making a fast LZ77 (like ULZ c1) results in too redundant LZ codes. Some attempts were made with LZRW, but it was 20 years ago. To be more tricky, we can build a small dictionary, with no redundant/duplicate strings for the fast string matching. The catch in how to build a proper dictionary, and build it fast, since the decoder should maintain this dictionary as well. And I did something interesting in this area...

  22. #52
    Member polemon's Avatar
    Join Date
    Jul 2010
    Location
    Germany
    Posts
    5
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Can we get the source code for ulz002? I'd like to compile it under Linux.

  23. #53
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Quote Originally Posted by polemon View Post
    Can we get the source code for ulz002? I'd like to compile it under Linux.
    ULZ 0.02 must be ported to the Linux before. It was developed under Visual Studio 2008 and uses some M$ specific functions like _ftelli64() - to handle a huge files... If I will have some time I'll do that port. Anyway, unreleased ULZ (or probably I'll release a new program called HLZ) is far more interesting. Furthermore, I'll probably abandon the huge file support (64-bit file sizes) to keep it compatible to GCC and other non M$ compilers.
    It would be cool to write some HDD driver with that compression. Or even produce some hardware-driven data compression module (CPU -> DATA PROCESSOR -> STORAGE MEDIA)...

  24. #54
    Member polemon's Avatar
    Join Date
    Jul 2010
    Location
    Germany
    Posts
    5
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Depending on how large your code is, I could port it to Linux if you want. Just let me know, and I'll see what I can do.

  25. #55
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Compared the new algorithm vs LZSS. Both have the same output, including the offset/index size, and both algorithms do just one string match at each step. Well, the difference is really huge! However, new algorithm adds some extra work to the decompressor - thus the decoder cannot be as fast as the LZSS. Anyway, new algorithm compresses notable faster and compression ratio is notable better. Slightly slower decompression is just the "cost of" new features! Such thing make sense...

  26. #56
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Working on ULZ 0.03!

    I already have a draft version. And it's gonna be the best LZSS/LZ77 I have ever made.

    Goals:
    + Fastest and simplest decoder possible
    + Compression on par with Deflate or higher. ULZ has no Huffman, but it has a larger dictionary and better parsing

    Compared to ULZ 0.02:
    + New ULZ has a smaller dictionary (128K vs 512K)
    + More advanced parsing (Lazy Matching with 1/2-byte lookahead vs Greedy Parsing)
    + More advanced LZ-output coding (Bit I/O vs Aligned Byte I/O)
    + ULZ 0.03 is able to compress much smaller strings at full range of a dictionary. SO it CAN compress strings length of three. ULZ 0.02 in contrast can compress strings starting at length four and in limited range
    + New ULZ can provide a larger data expansion on files like a10.jpg

    New ULZ will have a fast compression mode as well. Currently I keep in mind three compression modes:

    0 - Fast mode. Limited string searching. Greedy Parsing.
    I have a few versions of this mode:
    Fastest - 49007012 bytes on ENWIK8
    Fast1 - 46090162 bytes
    Fast2 - 43790930 bytes
    1 - Normal mode - 39774828 bytes (Deep string searching. Greedy Parsing)
    Technically, this mode is similar to max mode of ULZ 0.02. For comparison, ULZ 0.02 is two times slower in such mode compared to new ULZ.
    2 - Max mode. 38xxxxx or higher compression (Deep string searching. Advanced Parsing)

    New ULZ doesn't show it's full potential on ENWIKs, due to it's smaller dictionary. However, on other files, it is much more interesting...

  27. #57
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    The announcement put one word in my mind:
    ZFS.
    Sounds like a good candidate for filesystem compression. And ZFS allows blocks of up to 128 KB.

  28. #58
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    865
    Thanks
    463
    Thanked 260 Times in 107 Posts
    Well m^2, filesystem compression is an area i'm very interested in.
    Is there any place to start with ?

  29. #59
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by Cyan View Post
    Well m^2, filesystem compression is an area i'm very interested in.
    Is there any place to start with ?
    What do you want to know? There is no guide. Ask specific questions and you may get answers.

  30. #60
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    865
    Thanks
    463
    Thanked 260 Times in 107 Posts
    Here is an example of something similar i achieved quite some time ago with Bruno for an (old) handheld calculator :
    http://phantasie.tonempire.net/t75-s...sion-driver#83

    I've been trying to do the same on windows system, but without success up to now.
    To put it simple, i would like to find a way to replace (or live along) the compression driver provided within windows. Obviously, no simple objective.

    Probably Windows can be considered a less opened system than BSD or Linux, so maybe one part of the answer is to move to *nix. And for this reason, i've been looking into ZFS. But where is the part that deals with compression ?

    This source tree is really complex.
    Ideally, there should exist some kind of "in-out" interface. After all, it is up to the file system to handle sectors and files. The compressor could remain fairly agnostic of its usage, as long as it respects some basic conditions on memory usage.

    Maybe this interface exists in ZFS. It is just that i don't find it.

    Ideally, i would like to plug a very fast compressor, suitable for SSD usage.

Page 2 of 8 FirstFirst 1234 ... LastLast

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •