Page 1 of 2 12 LastLast
Results 1 to 30 of 32

Thread: LZSS with a large dictionary

  1. #1
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts

    LZSS with a large dictionary

    Just done some experiments with my dummy LZSS coder.

    + LZSS with 64K window
    + Optimal parsing (S&S in this case is really optimal)
    + Tagged code output

    There are two variants of the LZ-output coding:

    1. Flag based
    1-byte - Defines eight flags (0-Literal, 1-Match)
    followed by stored literal (1-byte) or LZ-code (3-bytes):
    1-byte - Match length
    2-byte - Match offset

    2. Tag based
    The first bit of a 1-byte tag defines:
    0-Run of literals, the number of literals stored in low 7-bits
    1-A match, match length stored in low 7-bits of a tag, followed by a 16-bit offset

    It's pure and simple. Mainly I play with such thing due to the simplest decoder possible which may be written in pure ASM.

    The second variant is much more stable on already compressed data:
    A10.jpg:
    1: 946,079 bytes
    2: 848,320 bytes

    At the same time the first variant may provide a higher compression in common:
    world95.txt:
    1: 968,257 bytes
    2: 1,001,897 bytes

    Additionally, with both cases we may increase the windows size at the cost of a smaller MAXMATCH.

    Check out some testing results of the first coding variant and different window sizes:
    world95.txt:
    64K: 968,257 bytes
    128K: 868,035 bytes
    256K: 797,373 bytes
    512K: 774,220 bytes

    For comparison, Deflate compresses world95.txt down to 862,824 bytes. i.e. starting with 256K window we may beat Deflate even without using Huffman or other entropy coding - using just pure byte-I/O, even with no bit-I/O!

    Anyway, the main benefit is really crazy decompression speed (Fastest?) and extremely small decompressor.


  2. #2
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Good job. I guess that you can beat deflate, since it lacks optimal parsing and only has a 32k window? Could you compare against a deflate implementation with optimal parsing? I think 7zip does so? (i might be wrong)

  3. #3
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    Quote Originally Posted by toffer View Post
    Good job. I guess that you can beat deflate, since it lacks optimal parsing and only has a 32k window? Could you compare against a deflate implementation with optimal parsing? I think 7zip does so? (i might be wrong)
    Using 7-Zip with the best settings (longest wordsize) I get:
    832,013 bytes

  4. #4
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    @encode:
    Hmm...The result smells like non-sliding window. Do you use sliding or non-sliding window? Because, it seems small window versions suffer from non-sliding window behaviour.

  5. #5
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    Quote Originally Posted by osmanturan View Post
    @encode:
    Hmm...The result smells like non-sliding window. Do you use sliding or non-sliding window? Because, it seems small window versions suffer from non-sliding window behaviour.
    Here I compress data by a large blocks (which in best is equal to the size of a file). i.e. I just drop the matches which are too distant... In other words, I use "sliding window".

  6. #6
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Just confused... Are you searching matches in "large blocks" and discarding which one is too distant? If yes, you can dynamically detect optimum dictionary size in some metrics. You can put a description: "Maximum Dictionary Size" instead of "Dictionary Size" Also, by detecting optimum dictionary size, you could improve compression by changing match offset coding style (for larger dictionaries 3-4 bytes for a single match offset, small dictionaries 1-2 bytes match offset). If you don't want to write a complex compressor which finds optimum dictionary size, you could use this feature for collecting some statistics. This will be best choice for deciding the window size.

    BTW, If I were you, I would prefer the fastest variant of the coding style. Because, I would expect a very fast LZ behaviour from such a small and efficient coder.

  7. #7
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    The goal of my LZSS is to get the compression ratio close to the Deflate, and at the same time keeping the decoder as simple as it possible.

    Actually, it's for my EXE-packer/protector/cryptor project...

    Currently, I'm keeping Flag coding (1st variant) and 128K window. BTW, seems to be my LZSS is stronger than LZOP (at least on ENWIKs, ENWIK9 - 361,122,040 bytes vs LZOP's 366,349,786)... I'm expecting that my beast has a faster decompression as well! If so, I can make a new Pareto frontier at LTCB...

  8. #8
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    Now I'm thinking about some ASM tricks in the decoder.

    For flag extraction:
    Code:
    SHR EBX, 1
    JNC @@2 // Literal
    And the main question how to store one extra bit of an offset. Maybe we should eliminate any branches using ADC:
    Code:
    SHR ECX, 1
    ADC EDX, EDX
    In addition, I'll try to do some extension - a variable length offset coding. With current scheme we able to encode matches starting at four bytes long. For short matches, we may limit the max offset to, say, 512. Thus we able to encode 3-byte string into two bytes. Interesting how this will affect the compression...

  9. #9
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    As I posted before, match length distribution for most files are like that:

    <8 match lengths stay in first place with a huge margin
    (8, 16] match lengths stay in second place with big margin
    16< match length less probable cases

    Note that, for executables <8 much more probable. I hope, this helps you.

  10. #10
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    Quote Originally Posted by osmanturan View Post
    Note that, for executables <8 much more probable. I hope, this helps you.
    Again you post well known things...

  11. #11
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by encode View Post
    Again you post well known things...
    So, I wonder why you don't use this well known statistics in your any LZ coder?

  12. #12
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    Quote Originally Posted by osmanturan View Post
    So, I wonder why you don't use this well known statistics in your any LZ coder?
    Again, you firstly posting without thinking. Did you read the definition about the simplest decoder possible? Any tricks or additions lead to unwelcomed complexity of the decoder. Like I said, my goal is to keep things as simple as it possible.

    Anyway, another cool trick with length/offset coding is that used with NTFS compression. Depending on current position of a block we reserve defined number of bits for both match offset and length - sharing the code space more cleverly.

    For example, current position or i=2000. The match offset cannot be larger than 2000, so we may reserve just 11 bits for the offset. The free code space we may use for a longer match length. Moving up, the offset will need more bits to code - we just limit the max match accordingly...

    It's cool trick, but in my case not really useful.

  13. #13
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    Also we may store an extra bit or bits to determine the offset length. In this case we may get LZSS with unlimited dictionary size. (16 MB, 4 GB, ...)

    For example:

    if offset<=256 we need just one byte
    if offset<=65536 we need two bytes
    etc.

    With ASM just one extra branch needed, as with tags we may use SHR+JC/JNC:
    Code:
    SHR ECX, 1
    JC @@4 // Long offset - read two bytes
    // Short offset - read one byte
    Will, test such idea for the compression. And again the optimal S&S parsing shoud really help here.

  14. #14
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    There must be a misunderstood. By offering this "well-known" but suprisingly "not well-used" statistics, I expect that you take into account to limit the match length or anything else. Because, as you said before, you are making this compressor for exe packing. If we ignore large redundant resources in an exe, rest of the exe file should be compressed better with less effort by using small match lengths. Also, you can use unused match length bits in a byte by shifting into a temporary register for future use like some codecs do. That's your choice. Decide it's complex or not. I look forward your test releases.

  15. #15
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    Actually the main criteria is the compression ratio gained VS the decoder complexity. If some trick gives just a tiny compression gain, at the same time such trick requires serious decoder changes, etc. I say it not worth it. Because, if really the goal was the compression, of course arithmetic encoding and other stuff should be used here. Like you see even LZ-output is tuned for ASM implementation.

    Well, for executables it is important to code small matches (2-4 byte strings) efficiently. The coder described above able to code strings starting at four bytes.

    I made some changes to test the byte-expanded offset coding in pair of an extra bit stored. Thus we may code match with offset 1..512 into just two bytes and 1..131072 into three bytes. The simpler modification, without an extra byte codes 1..256/1..65536.

    Additionally, I made a version with 16 MB dictionary with such byte expandable offset coding - 1..256/1..65536/1..16777216.

    The result for world95.txt:
    754,647 bytes

    Not that higher as expected.


  16. #16
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I know it's still in draft version. But, could you post some test results with exe files instead of some stationary sources? This should be better for you to receive correct comments. Also, do you plan to make a E8/E9 preprocessor or anything else?

  17. #17
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    I did not plan filters, at least currently, because even simple E8/E9 transformer is more complex than current LZSS decoder!

    OK, some testing results with EXEs from EncCC:
    Here are two variants - first with fixed (F) length offset coding, second is a variable (V) length (or byte expand) offset coding. Both versions are tested with various window sizes.

    MPTRACK.EXE (1,159,172 bytes)
    F-64K: 687,469 bytes
    F-128K: 670,417 bytes
    F-256K: 659,507 bytes
    F-512K: 658,311 bytes
    V-64K: 639,090 bytes
    V-128K: 616,299 bytes

    Reaktor.exe (14,446,592 bytes)
    F-64K: 3,224,609 bytes
    F-128K: 3,211,073 bytes
    F-256K: 3,299,554 bytes
    F-512K: 3,501,945 bytes
    V-64K: 3,018,197 bytes
    V-128K: 3,048,161 bytes

    TraktorDJStudio3.exe (29,124,024 bytes)
    F-64K: 8,586,467 bytes
    F-128K: 8,350,275 bytes
    F-256K: 8,299,888 bytes
    F-512K: 8,583,791 bytes
    V-64K: 7,956,252 bytes
    V-128K: 7,769,696 bytes

    Photoshop.exe (19,533,824 bytes)
    F-64K: 10,384,126 bytes
    F-128K: 9,921,033 bytes
    F-256K: 9,514,912 bytes
    F-512K: 9,257,756 bytes
    V-64K: 9,700,028 bytes
    V-128K: 9,211,801 bytes

    Doom3.exe (5,427,200 bytes)
    F-64K: 2,573,723 bytes
    F-128K: 2,500,681 bytes
    F-256K: 2,456,553 bytes
    F-512K: 2,432,881 bytes
    V-64K: 2,402,924 bytes
    V-128K: 2,324,356 bytes

    To be honest I still prefer versions with fixed offsets - due to the decoder simplicity...



  18. #18
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    Check out SFX file to test my LZSS:
    r-sfx.exe

    This one written using Delphi, the decoder written in 100% ASM... Run this .exe, "Reaktor.exe" will be extracted to current directory.


  19. #19
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Tested with acutimer on my laptop (you already know the specification)

    Decompression: 0.604 Seconds

    Note that "acutimer echo" takes 0.053 Seconds. So, we are able to say that it's decompression speed about 23 MB/sec. This is roughly my hdd writing speed limit. Good job ilia!

    Edit:
    WinRAR 3.71 (ZIP Fastest): 3,122,235 bytes
    r-sfx.exe: 3,231,041 bytes

    Close enough. Just make it better!
    Last edited by osmanturan; 30th July 2008 at 22:33.

  20. #20
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    VERY soon I might release a standalone compressor called LZSS. Just compression/decompression using algorithm described in this thread. I not decided yet about window size - 128K or 256K? Although LZSS completely written in C++, the decompression speed should be about the same. Maybe later I'll release ASM-driven decoder, for comparison. Like I said with ASM we may do some extra tricks in favor to faster code.

  21. #21
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    Quote Originally Posted by osmanturan View Post
    Tested with acutimer on my laptop (you already know the specification)

    Decompression: 0.604 Seconds
    I tested standalone LZSS on my machine, decompressing 'Reaktor.exe':
    Kernel Time = 0.031 = 00:00:00.031 = 11%
    User Time = 0.062 = 00:00:00.062 = 22%
    Process Time = 0.093 = 00:00:00.093 = 33%
    Global Time = 0.281 = 00:00:00.281 = 100%


    Note that, the release will be faster due to some extra optimizations - currently it is just DRAFT version...

  22. #22
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I think, you should find a better name - not use LZSS. Also, could you post some compression/decompression timing with 128/256 KB window size? I personelly prefer your corpus for this test (includes several large executables). So, we (I mean users) are able to comment on the window size.

    BTW, I remembered your previous attempt about exe-packer. Kaspersky was detected it as a virus. I hope, my new kaspersky will not detect it as a virus

  23. #23
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by encode View Post
    I tested standalone LZSS on my machine, decompressing 'Reaktor.exe':
    Kernel Time = 0.031 = 00:00:00.031 = 11%
    User Time = 0.062 = 00:00:00.062 = 22%
    Process Time = 0.093 = 00:00:00.093 = 33%
    Global Time = 0.281 = 00:00:00.281 = 100%


    Note that, the release will be faster due to some extra optimizations - currently it is just DRAFT version...
    Please do not use Igor's timer. It has very small resolution. LovePimple is an expert in this area. You can ask it for the reason to not use Igor's timer. He has a very long bug list about Igor's timer

  24. #24
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    Quote Originally Posted by osmanturan View Post
    Please do not use Igor's timer. It has very small resolution. LovePimple is an expert in this area. You can ask it for the reason to not use Igor's timer. He has a very long bug list about Igor's timer
    OK! With AcuTimer I got (0.326 Seconds)...

    Well, the LZSS name because it's not a complete compressor - it's just some proggie for experiments. Thus, the encoder is very slow and eats lots of memory (500+ MB). We process data by 32 MB blocks, finding all matches at each position. After, we do S&S parsing on the entire 32 MB block. In practical coder I need to include fast modes with simple string searching as well as binary tree based match finder with max mode. Anyway, just messing around with byte-aligned LZSS...

    Like I said the goal is to achive the max compression with given simplest compression format which allows simplest and fastest decoder.

  25. #25
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by encode View Post
    ...the encoder is very slow and eats lots of memory (500+ MB)...
    It's hard to imagine that using 4 MB or more window size

  26. #26
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    Quote Originally Posted by osmanturan View Post
    It's hard to imagine that using 4 MB or more window size
    Actually match finder is able to find matches across 32 MB... Recall the too distant matches dropping. Experimental stuff, you know...

  27. #27
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    Added a fast mode to my LZSS... Now it's a very fast compressor with the max mode available - which provides the smallest output possible in given compression format.

  28. #28
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    BTW, it compresses ENWIK8 in 2 sec.

  29. #29
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Very fast. What is the compressed size?

  30. #30
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    Fast-128K: 53,871,336 bytes
    Max-128K: 41,038,270 bytes
    Fast-256K: 53,038,492 bytes
    Max-256K: 38,388,434 bytes


Page 1 of 2 12 LastLast

Similar Threads

  1. LZSS v0.01 is here!
    By encode in forum Data Compression
    Replies: 67
    Last Post: 28th March 2012, 10:10
  2. LZW with unbounded dictionary
    By encode in forum Data Compression
    Replies: 34
    Last Post: 28th September 2010, 03:30
  3. bzip2 dictionary size
    By Wladmir in forum Data Compression
    Replies: 3
    Last Post: 7th April 2010, 16:09
  4. Large text benchmark
    By Matt Mahoney in forum Forum Archive
    Replies: 39
    Last Post: 13th January 2008, 00:57
  5. New rule for large text benchmark
    By Matt Mahoney in forum Forum Archive
    Replies: 5
    Last Post: 28th October 2007, 21:00

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •