Results 1 to 21 of 21

Thread: LZW v0.1 is here!

  1. #1
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    OK, playing with LZW algorithm I've done this:
    lzw01.zip (42 KB)

    Encoder is in DRAFT/DEBUG.
    Decoder is super-fast and cool!

    Enjoy anyway!

  2. #2
    Member
    Join Date
    Jan 2008
    Posts
    33
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hello encode !

    Great! Thanks!
    I will test it later!

  3. #3
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Thanks Ilia!

  4. #4
    Member Vacon's Avatar
    Join Date
    May 2008
    Location
    Germany
    Posts
    523
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hello everyone,

    thank you Ilia
    I'll have a look!

    Best regards!

  5. #5
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    Very faster in decompression! Hi!

  6. #6
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Just implemented some tricky LZMW modification.

    Quick testing results:

    world95.txt:
    COMPRESS: 1,133,483 bytes
    LZW v0.1: 1,012,516 bytes
    LZW v0.2: 891,228 bytes

    fp.log:
    COMPRESS: 2,699,855 bytes
    LZW v0.1: 1,892,398 bytes
    LZW v0.2: 1,188,390 bytes

    bible.txt:
    COMPRESS: 1,377,093 bytes
    LZW v0.1: 1,306,124 bytes
    LZW v0.2: 1,217,840 bytes

    Crazy performance for coder with no Huffman/Ari/Bit I/O.


  7. #7
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    ENWIK8: 41,965,138 bytes
    ENWIK9: 367,636,512 bytes


  8. #8
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    How large is your window? And how large is the dictionary alltogether?
    M1, CMM and other resources - http://sites.google.com/site/toffer86/ or toffer.tk

  9. #9
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    My LZW has no window by itself - I call it I/O buffer and it is 16 MB long.

    Dictionary can hold up to - 65536 phrases (16-bit codes)...

    Nothing special...

  10. #10
    Member
    Join Date
    Dec 2006
    Posts
    611
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by encode
    Just implemented some tricky LZMW modification.
    Looks promising, considering it uses still the same decoder Thanks!

  11. #11
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    MW requires decoder modification, since it add more strings to the dictionary.

    Have you ever heared about berndstein's yabbawhap? It adds a nice dictionary management.

    But your "buffer" is sliding?

    ...and why not look up 12 bit dictionaries by context? You can still do fast lookup/same decoder complexity. And the memory hit wouldn't be too high.
    M1, CMM and other resources - http://sites.google.com/site/toffer86/ or toffer.tk

  12. #12
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by Black_Fox
    Looks promising, considering it uses still the same decoder
    Decoder is not the same - its even faster!

  13. #13
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by toffer
    have you ever heared about berndsteins yabbawhap? it adds a nice dictionary management.
    As you can see, I used UNWHAP-like term many times...
    Quote Originally Posted by toffer
    but your "buffer" is sliding?
    Nope. The "dictionary" is sliding - that means that each time I replace the oldest entry - i.e. the dictionary organized as a circular table.


  14. #14
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    How can your decoder be the same? MW adds phrases in a different manner compared to "textbook-lzw". I did't know about "unwhap" and i guess it's the same?!
    M1, CMM and other resources - http://sites.google.com/site/toffer86/ or toffer.tk

  15. #15
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by toffer
    ...and why not look up 12 bit dictionaries by context? You can still do fast lookup/same decoder complexity. And the memory hit wouldnt be too high.
    Just currently I prefer to improve the baseline LZW with no context stuff - like you see Im moving towards...

    I still want to keep the 16-bit codes - i.e. no Huffman, no Arithmetic, no Bit I/O - kind of a challenge...

  16. #16
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by toffer
    How can your decoder be the same? MW adds phrases in a different manner compared to "textbook-lzw".
    Decoder is NOT the same, its just slightly modified, which makes it even faster...

    Quote Originally Posted by toffer
    I didt know about "unwhap" and i guess its the same?!
    Yep, UNWHAP term is about improved LZW-type decoders - read about LZRW5!

  17. #17
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts

  18. #18
    Member
    Join Date
    Dec 2006
    Posts
    611
    Thanks
    0
    Thanked 1 Time in 1 Post
    And in the end compress will be changed to lzw 1.0

  19. #19
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts


    Well, current aim is just a tiny decoder + research. I think that due to patent issues LZW and LZW-related algorithms was abandoned by the developers. At the same time the work on LZ77 was raised... LZW just lost his potential. As you can see even with no Huffman my LZMW-type encoder is very close to Deflate on text files!
    The release of LZW 0.2 will be slightly delayed due to experiments...
    At least the decoder will be open source. By the way, look, it can fit into this post:
    Code:
     
    static void decompress() { 
      int prv=0; 
      int i=0; 
      unsigned short c; 
      while (fread(&c, 1, sizeof(c), in)) { 
        int len=1; 
        if (c>=256) { 
          len=tab[c].len; 
          memcpy(&buf[i], &buf[tab[c].pos], len); 
        } 
        else 
          buf[i]=c; 
        if (prv) { 
          tab[r].pos=i-prv; 
          tab[r].len=prv+len; 
          if (++r==TABSIZE) 
            r=256; 
        } 
        prv=len; 
        if ((i+=len)==N) { 
          fwrite(&buf, 1, i, out); 
          prv=0; 
          i=0; 
        } 
      } 
      if (i) 
        fwrite(&buf, 1, i, out); 
    }

    The complete decoder!


  20. #20
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Nice to see such "old-school" stuff to evolve. I think there's room for improvement in management of incompressible data (like having a special code for "copy the next few bytes") and for initial adaption, since compression is lost by 16 bit code output.
    If you can only gain compression by replacing 3 byte strings (16 bit codes), you could always just output a character and use the least frequent character as an escape to output a 16bit code (for string replacement or coding a normal character). Data won't expand heavily (only by two bytes, when the least frequent character appears) and you will gain compression. What about this?
    M1, CMM and other resources - http://sites.google.com/site/toffer86/ or toffer.tk

  21. #21
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by toffer
    What about this?
    Where are definitely a room for experiments. For example we may even add an arithmetic compression and encode the codes as a deltas. With ARI we may use flags for literals/matches, and so on...


Similar Threads

  1. LZW, LZMW and LZAP comparison
    By encode in forum Data Compression
    Replies: 14
    Last Post: 3rd August 2017, 16:34
  2. LZW with unbounded dictionary
    By encode in forum Data Compression
    Replies: 34
    Last Post: 28th September 2010, 04:30
  3. LZW v0.2 is here!
    By encode in forum Forum Archive
    Replies: 6
    Last Post: 8th February 2008, 23:53
  4. New LZW variant
    By encode in forum Forum Archive
    Replies: 14
    Last Post: 28th January 2008, 22:33
  5. UNWHAP-like decompression or LZW reincarnation
    By encode in forum Forum Archive
    Replies: 5
    Last Post: 24th January 2008, 23:18

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •