Page 2 of 3 FirstFirst 123 LastLast
Results 31 to 60 of 68

Thread: LZSS v0.01 is here!

  1. #31
    Member
    Join Date
    May 2008
    Location
    Antwerp , country:Belgium , W.Europe
    Posts
    487
    Thanks
    1
    Thanked 3 Times in 3 Posts
    Quote Originally Posted by Cyan View Post
    Nevermind,
    i think i've got one scenario where delaying the Match decision does pay off :
    More about (optimal) parsing : http://www.cbloom.com/algs/dictionary.html

  2. #32
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    865
    Thanks
    463
    Thanked 260 Times in 107 Posts
    Thanks for the link, this was very interesting reading.
    At least, i can use some correct "names" on concepts.

    So i guess the next step is to spend some time on lazy matching, it looks like the "level 2" difficulty level while optimal parsing seems a bit further down the road.

    Regards

    Yann

  3. #33
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    865
    Thanks
    463
    Thanked 260 Times in 107 Posts

    Smile Feedback

    Just willing to give some feedback on "Lazy matching" experiment.

    It appears that, by just using a "simple" one-symbol deferred decision, this immediately brings some tangible benefits, anywhere from 0.5% to 7% per file. A fair majority reside in the 2.5% vicinity. The compressed result is even a bit faster to uncompress.

    This is nothing to sneeze at, and indeed a lot for such a simple modification of the encoder. As you know, decoder is unmodified, and subsequently, neither is the format.

    I guess that experimenting with "deeper" lazy matching will bring less benefits (rule of diminishing returns), but well, this is already a pretty good step in the right direction.

    So i want to thank you for providing some very good hints, helping this solution to provide even better results.

    Regards

    Yann
    Last edited by Cyan; 5th September 2008 at 04:52.

  4. #34
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    865
    Thanks
    463
    Thanked 260 Times in 107 Posts

    Improving compression speed with Hash tables

    Hello Everyone

    Still developing my first compressor on Handheld calculators, i've been having a look at speed of compression lately (only decoding had been optimised up to now).

    To speed up compression, i've been using a simple "hash dispatch" algorithm, something crude, but which immediately produced good performance benefits.

    As it stands, my compressor is somewhat comparable in speed with other compressors on the same target system, which is illustrated by benchmarks here :
    http://LZD.webhop.org

    I however have a huge performance penalty when searching through large chunks of zero. This case do happens when dealing with black&white pictures (blank areas).

    To understand why, i have to explain my algorithm :
    At each symbol position, i generate a hash, which is then written into a dispatch table. From there, i generate a link to a "memory table". In this second table, each memory position keep a link to the previous memory position with the same hash.

    Quite simple "chain" methodology.
    Now for the problem : when having a lot of zeroes, a lots of "hashes" are identicals. Well, my hash of Zeroes is zero...
    Therefore, the "chain of addresses" with same hash is very long. Very often, each address is just linking to the previous address. Of course, going into the table just to retrieve the previous address is a lot longer than just doing "memory-1".

    Ok, this behavior seems fine, only a logical consequence of the selected algorithm.
    But then, comparing performances with other compressors, i see that most of them are not suffering from the same performance hit.
    So, i guess, i'm doing something wrong.

    I believe that creating a "special code" for zeroes is not a clean work-around. I expect that a probably well-known solution already exists which deals elegantly with this side effect. Maybe the algorithm itself is not so good and should be replaced with something more efficient.

    Any Idea ?

    Thanks
    Last edited by Cyan; 17th September 2008 at 02:48.

  5. #35
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Continue experimenting with my super simple LZSS. I already done lots of experiments with different LZ-output coding - variable offset and match length coding. Anyway, I stayed with the current fixed-size LZ-codes (Fixed 24-bit code for offset and length).

    Check out some results with different dictionary sizes and ENWIK8:
    Code:
     64K: 43,745,108 bytes
    128K: 40,854,057 bytes
    256K: 38,254,303 bytes
    512K: 36,160,910 bytes
      1M: 35,246,036 bytes
    
     ZIP: 36,422,375 bytes (For reference)

    world95.txt
    :
    Code:
     64K: 961,142 bytes
    128K: 862,790 bytes
    256K: 793,907 bytes
    512K: 774,622 bytes
      1M: 852,558 bytes
    
     ZIP: 862,824 bytes (For reference)
    3200.txt:
    Code:
     64K: 7,430,726 bytes
    128K: 6,845,232 bytes
    256K: 6,345,279 bytes
    512K: 5,931,152 bytes
      1M: 5,614,100 bytes
    
     ZIP: 6,201,591 bytes (For reference)

    bible.txt
    :
    Code:
     64K: 1,372,518 bytes
    128K: 1,258,563 bytes
    256K: 1,165,422 bytes
    512K: 1,110,041 bytes
      1M: 1,126,417 bytes
    
     ZIP: 1,175,510 bytes (For reference)
    *ZIP is from WinRAR 3.51 - Best method.

    512K version looks pretty cool! Note that each time a larger dictionary halves MAXMATCH. That's why 512K can be better than 1M...

  6. #36
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts

    Thumbs up

    Nice results!

  7. #37
    Member
    Join Date
    May 2008
    Location
    England
    Posts
    325
    Thanks
    18
    Thanked 6 Times in 5 Posts
    Looks good, proof that bigger isn't always better!

    And it seems that when the 1M version is smaller, it's by a lower % than when 512K is smaller, if that makes sense. ie overall 512K looks a better compromise on those samples.

  8. #38
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    If you'd like to keep your fixed 24 bit codes you can still make them variable. You just need to fit different quantisations of (offset,length) into 24 bits.

    On the other hand longer matching strings usually appear over greater distances. You might want to try it...

  9. #39
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Quote Originally Posted by toffer View Post
    If you'd like to keep your fixed 24 bit codes you can still make them variable. You just need to fit different quantisations of (offset,length) into 24 bits.

    On the other hand longer matching strings usually appear over greater distances. You might want to try it...
    The NTFS compression uses a nice trick - at the beginning of a chunk, code-space for offset is smaller than at the end - it's due to, say, at position 1024 we may not get the match at offset bigger than 1024, moving forward, the available positions or offsets are grow up. The unused space may be used for match length, as example. It's something like flat codes in LZW (or LZC - UNIX COMPRESS). Anyway, this trick may kill the decoder simplicity. Note that in my case, the decoder is really simple and fast...

  10. #40
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    I think I should look more deeply to the variable length LZ-codes - say, 16/24-bit codes or even 8-bit codes.

    For example:

    Small LZ-code (16-bit):
    1-bit - Small/Large Flag (0)
    3-bit - Match Length
    12-bit - Match Offset

    Large LZ-code (24-bit):
    1-bit - Small/Large Flag (1)
    6-bit - Match Length
    17-bit - Match Offset

    And here, my Optimal Parsing rules - we may build more complex combinations. Such encoding is more efficient for EXE-files - since we may compress small matches (3-bytes long). However, not always it's better. First of all we loose one bit at large code - we loose dictionary size or max. match length. Actually other simple approaches like this one not really better.
    We may use fixed match length code and variable offset code:

    Close Match:
    1-bit - Distance Flag (0 - Close)
    5-bit - Match Length
    10-bit - Match Offset

    Far Match:
    1-bit - Distance Flag (0 - Far)
    5-bit - Match Length
    18-bit - Match Offset

    In some cases such scheme is better than that described above...

    Such simple stuff too file-dependent...

    Maybe using a real variable length codes is the answer. Again the question in the decoder simplicity.

    By the way, also I played a lot with Tagged output:

    0..127 - Count for literal raw run (1..128 literals)
    128..255; 0..65535 - Match length/Match position

    Such scheme deals nicely with already compressed data a la A10.jpg, however, in general it's worser, because mostly we deal with short literal runs (< and in this case simple 0/1 flag idea works better...


  11. #41
    Member chornobyl's Avatar
    Join Date
    May 2008
    Location
    ua/kiev
    Posts
    153
    Thanks
    0
    Thanked 0 Times in 0 Posts

    8-bit codes

    >or even 8-bit codes
    Small LZ-code (8-bit):
    1-bit - Small/Mid Flag (1)
    1-bit - Match Length 2/3 bytes
    6-bit - Match Offset

    Mid LZ-code (16-bit):
    1-bit - Small/Mid Flag (0)
    1-bit - Mid/Large Flag (1)
    3-bit - Match Length
    11-bit - Match Offset

    Large LZ-code (24-bit):
    1-bit - Small/Mid Flag (0)
    1-bit - Mid/Large Flag (0)
    5-bit - Match Length
    17-bit - Match Offset

  12. #42
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    865
    Thanks
    463
    Thanked 260 Times in 107 Posts
    I've been toying with a 12bit codes,
    mainly because the processor was Nibble-aligned (4 bits)
    which is rather unusual, most processors being Byte-aligned (8 bits).

    Well, anyway, with such a small code,
    it was not so easy to find the better matchlength/offset combination.

    I got it right when accepting to introduce variable code length for the matchlength.
    I had much less success when trying to do the same for offset.

    It resulted in a 9-3 [O-M] combination.
    If matchlength is larger than MaxMatch, then i simply append another symbol, which is added to the previous one.
    This method can be proven to be always favorable for compression rate, compared to not using it.
    And it is very simple.

    The added complexity is very low on the decoder (and the coder). To give an idea, the decoder is only 205 bytes long, including memory management & error cases. So this seems reasonably simple.
    With low complexity comes speed; as it stands, the decoder is the fastest produced on the target system, just by respecting "simplicity" rules.

    If interested, all results are documented here :
    http://phantasie.tonempire.net/utili...are-t65.htm#67

    I have to thank encode.su for help towards the right direction in this project.
    Last edited by Cyan; 10th December 2008 at 06:34.

  13. #43
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Quote Originally Posted by m^2 View Post
    I am interested.
    What do you like in the LZSS the most? I should know for the future LZSS improvements...

  14. #44
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by encode View Post
    What do you like in the LZSS the most? I should know for the future LZSS improvements...
    Obviously decompression speed. And efficiency. You say small decompression stub: it's cool too, but executable compression is the only case I see where it really matters.

    I think that you should decide about one thing: Do you want to tweak to keep max in memory decompression efficiency or from some slower media (hard drives...). Currently LZSS is too fast for HDDs and could get better size with about the same speed.

  15. #45
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Quote Originally Posted by m^2 View Post
    Obviously decompression speed. And efficiency. You say small decompression stub: it's cool too, but executable compression is the only case I see where it really matters.
    Initially it was designed for EXE packers...

    Quote Originally Posted by m^2 View Post
    I think that you should decide about one thing: Do you want to tweak to keep max in memory decompression efficiency or from some slower media (hard drives...). Currently LZSS is too fast for HDDs and could get better size with about the same speed.
    I have a few tricks that will improve compression with no actual speed loss at the cost of slightly larger decoder (this matters for ASM stubs only). In addition, I may add a variable length coding like Gamma one to get a higher compression...

  16. #46
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by encode View Post
    Initially it was designed for EXE packers...
    It looks like it was.
    Actually at some point I've been thinking about asking you to incorporate it to UPX, but calculations showed that because it's weaker than NRV2, you'll get higher IO penalty and startup time is gonna be longer, though LZSS has lower CPU usage.
    Maybe some tricks like E8E9 would help? (I'm just a noob unsure of own guesses. I'd gladly hear whether they are right.)

  17. #47
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Check out the source of the original LZSS v0.01! But please note that the source is a quite draft. Anyway, enjoy!

  18. #48
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by encode View Post
    Check out the source of the original LZSS v0.01! But please note that the source is a quite draft. Anyway, enjoy!
    Thanks. Can't wait to have a look.

  19. #49
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    VC8 (that's what you used, right?) does a really good job in optimizing this program. It's 7% faster than gcc 3.4.5 and 4.5% faster than VC6.

  20. #50
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Just written world's fastest LZSS. The decoder has nearly copy complexity - i.e. mostly no additional arithmetic operations needed (as with tags or other byte-aligned LZSS). Not sure that it is possible to do something simpler. The encoder is also simple enough and designed for speed. Maybe I will finally release a competitor to QuickLZ/FastLZ/LZOP/etc. Currently just playing with various configurations, thinking about a new program name, etc.

  21. #51
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Some goals of the new piece of software:
    • Simplest decoder - as simple as it possible. So even beginner programmers may write own decoder using any programming tool - Delphi, Basic, C/C++, ASM, etc. Thus anyone able to incorporate a small code into their projects. In addition, to be able to run under modest hardware, such as calculators and portable game consoles.
    • Streamed stability on already compressed data - i.e. with no additional headers and stored blocks.
    SimpleLZ? FastLZ/QuickLZ/BriefLZ already taken. Or something that consist of LZ and copy. Because all the decoder does is just copying - run of literals or a match. (N bytes from X)

    Not decided completely on the LZ-output format and dictionary size (128K/256K). Also not decided about the encoder. The fast encoder keeps too many air the the output - too many matches are missed. The fully optimized S&S-based encoder is too slow. Standard hash-chain based encoder with greedy and/or lazy matching looks OK.


  22. #52
    Member Fu Siyuan's Avatar
    Join Date
    Apr 2009
    Location
    Mountain View, CA, US
    Posts
    176
    Thanks
    10
    Thanked 17 Times in 2 Posts
    I just can't imagine how simpler it can be. LZSS0.01 decoder is simple and fast enough.

  23. #53
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    LZSS v0.01 uses tags - i.e. one tag byte represent eight literal/match flags. Thus, for each literal/match we need to extract a flag bit. With the run scheme we may decode 128 literals at once. Also such scheme is more stable on compressed data such as "a10.jpg" (1 extra byte per 128 bytes VS 1 extra byte per 8 bytes). The disadvantege is some compression penalty. Anyway, I'm still under experiments...


  24. #54
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Thumbs up!

  25. #55
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    A quite interesting results were produced by the LZSS v0.01 with two-type LZ codes (far match/close match) and S&S optimizer that keeps two matches for each position. Serious compression gain, but my S&S implementation slow as hell!

    Soon I will release something experimental and fast enough to be used in practice.


  26. #56
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Done lots of experiments with LZP and ROLZ. Well, with the simplest byte I/O such tricks are useless. LZ77 is the only thing that strikes here!

  27. #57
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Simplest LZP is very cool - it's so simple and encoder is the world's fastest, but the compression is too low...

  28. #58
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    LZP can also be used as a preprocessor. For example, paq9a has an order 12 LZP as a speed optimization for highly redundant files. It can also improve compression of low order models.

    zpaq co2.cfg enwik8 -> 36694483, 53.9s
    flzp c enwik8 enwik8.flzp -> 57366279, 7.1s
    zpaq co2.cfg enwik8.flzp -> 30888543, 36.5s

    o2.cfg is a simple order 2 model with direct lookup

    Code:
    comp 0 0 0 0 1
       0 cm 25 5 (134 MB)
    hcomp
      *d<>a a>>= 1 a+=*d a<<= 9 *d=a halt
    post
      0
    end

  29. #59
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    New program will have an optimized hash function - the hash multiplier was optimized using my optimizer! Probably, it's a good idea to release a hash optimizer - to make an extremely efficient hash functions with a given hash size and data type.
    Currently, my LZSS (It's not LZSS actually it's an LZ77) have 512 KB dictionary. Finally, I made a proper HashChain-based match finder with a small memory footprint). Current memory usage ~20 MB for compression and ~16 MB for decompression. 16 MB is an I/O buffer. The compression is not fast due to exhaustive string search, the compressor have an online path optimizer - kind of a lazy matching. The decompression is extremely fast, I think it's world's fastest, ENWIK9 decompression takes 5 sec. under poor Pentium 4 powered machine.

    The name of a program shouldn't be "LZSS". LZSS is about bit flags to determine match or a literal. And here we have no bit flags. Just can't figure out how to call it. It should contain an "LZ" - *SOMETHING*LZ or LZ*SOMETHING*.


  30. #60
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Or, I may keep the LZSS idea and use 1 MB dictionary. Such compressor will be less stable on already compressed data, slightly slower decompression, but a higher compression ration in most cases.

Page 2 of 3 FirstFirst 123 LastLast

Similar Threads

  1. LZSS with a large dictionary
    By encode in forum Data Compression
    Replies: 31
    Last Post: 31st July 2008, 22:15

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •