Results 1 to 11 of 11

Thread: LZW implementation

  1. #1
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    103
    Thanks
    41
    Thanked 8 Times in 8 Posts

    LZW implementation

    Could anyone recommend a high quality open source implementation of LZW? By high quality, I mean close to that of Deflate or LZ4. Also are there any LZ4 equivalent format of LZW, in terms of compression and decompression speeds? Thx in advance for any info.

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    4,013
    Thanks
    303
    Thanked 1,328 Times in 759 Posts
    There was this thread recently: https://encode.su/threads/3419-Are-t...eword-encoding
    As to speed, LZW without entropy coding may have similar complexity, but LZW still has to do more work normally (like dictionary updates),
    plus LZ4 is already optimized - has different handlers for some ranges of match lengths etc. The same would have to be done for LZW too.

  3. Thanks:

    smjohn1 (29th October 2020)

  4. #3
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    103
    Thanks
    41
    Thanked 8 Times in 8 Posts
    Quote Originally Posted by Shelwien View Post
    There was this thread recently: https://encode.su/threads/3419-Are-t...eword-encoding
    As to speed, LZW without entropy coding may have similar complexity, but LZW still has to do more work normally (like dictionary updates),
    plus LZ4 is already optimized - has different handlers for some ranges of match lengths etc. The same would have to be done for LZW too.
    Thanks a lot. That thread is quite helpful. But besides lzw-ab, what is a best quality open source package for standard LZW? Compress? And just curious, why is LZW not included in LZ-Bench or already but under a different package name?

    I agree, LZ4-like implementation might not be better than LZ4 in speeds, but it might be worthy trying, as there is room in more efficiently represent index of words in the dictionary for better compression ratio, which in turn may effect speeds. No one tried before?

  5. #4
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    4,013
    Thanks
    303
    Thanked 1,328 Times in 759 Posts
    > what is a best quality open source package for standard LZW? Compress?

    https://github.com/andrew-aladev/lzws
    https://en.wikipedia.org/wiki/Compress#External_links
    https://sourceforge.net/p/giflib/code/ci/master/tree/
    https://github.com/rouault/libtiff/tree/master/libtiff

    > And just curious, why is LZW not included in LZ-Bench or already but under a different package name?

    LZW had been patented for ~30 years, so people try to ignore it.

    Also compression ratio is nothing interesting.

    > efficiently represent index of words in the dictionary for better
    > compression ratio, which in turn may effect speeds. No one tried before?

    LZW is a specific algorithm, which is not especially good.

    There're various different implementations of LZ78 methods,
    like WRT text filters, or https://github.com/jrmuizel/GLZA

  6. #5
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    103
    Thanks
    41
    Thanked 8 Times in 8 Posts
    Quote Originally Posted by Shelwien View Post
    >

    LZW is a specific algorithm, which is not especially good.

    There're various different implementations of LZ78 methods,
    like WRT text filters, or https://github.com/jrmuizel/GLZA
    Thanks a lot again for lots of useful info.

    So in your opinion, what is the best variation in the LZ78 family, if not LZW? Any LZ4 counterpart in LZ78 family? I am just wondering why there are so many LZ77 variations in lz-bench but not LZ78? Is it because LZ78 has no advantage over LZ77, in compression ratio and/or speeds?

  7. #6
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    4,013
    Thanks
    303
    Thanked 1,328 Times in 759 Posts
    > So in your opinion, what is the best variation in the LZ78 family, if not LZW?

    Define "best".
    Compression-wise its probably GLZA currently.

    > Any LZ4 counterpart in LZ78 family?

    If you want something simple, it may be this: https://encode.su/threads/2983-Crack...ll=1#post57817
    As to speed potential, LZ78 is inherently slower than LZ77,
    because it has to do extra work during decoding, also
    large-alphabet entropy coding is much harder to do.

    > I am just wondering why there are so many LZ77 variations in lz-bench but not LZ78?

    LZW/LZ78 variants were pretty popular in 198x, then people started patenting them,
    and thus killed that branch of development.
    Its somewhat described here: https://oku.edu.mie-u.ac.jp/~okumura...n/history.html

    Also lzbench is just a personal project of its developer (inikep): https://github.com/inikep/lzbench
    For example, here's another similar benchmark: https://github.com/powturbo/TurboBench
    with some different codecs.
    Or this: http://mattmahoney.net/dc/text.html - this one includes some LZW versions.

    > Is it because LZ78 has no advantage over LZ77, in compression ratio and/or speeds?

    Technically LZ78 includes LZ77 (I mean classes of compression algorithms, not original methods) -
    we can say that {token_type;distance;length} is just a method of dictionary index generation.

    Also there's a whole spectrum of intermediate algorithms,
    which attempt to reduce LZ77 redundancy (assigning many different codes to same strings):
    LZFG,ACB,LZP,ROLZ,even PPM can be considered one (ROLZ with unary length coding).

    I do think that LZ78 has potential to be stronger than LZ77 - we can even see that with GLZA.
    It may be not realized yet, but we cannot force people to work on it.

  8. Thanks (2):

    anormal (2nd November 2020),smjohn1 (29th October 2020)

  9. #7
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    103
    Thanks
    41
    Thanked 8 Times in 8 Posts
    Quote Originally Posted by Shelwien View Post
    > So in your opinion, what is the best variation in the LZ78 family, if not LZW?

    Define "best".
    Compression-wise its probably GLZA currently.

    > Any LZ4 counterpart in LZ78 family?

    If you want something simple, it may be this: https://encode.su/threads/2983-Crack...ll=1#post57817
    As to speed potential, LZ78 is inherently slower than LZ77,
    because it has to do extra work during decoding, also
    large-alphabet entropy coding is much harder to do.

    > I am just wondering why there are so many LZ77 variations in lz-bench but not LZ78?

    LZW/LZ78 variants were pretty popular in 198x, then people started patenting them,
    and thus killed that branch of development.
    Its somewhat described here: https://oku.edu.mie-u.ac.jp/~okumura...n/history.html

    Also lzbench is just a personal project of its developer (inikep): https://github.com/inikep/lzbench
    For example, here's another similar benchmark: https://github.com/powturbo/TurboBench
    with some different codecs.
    Or this: http://mattmahoney.net/dc/text.html - this one includes some LZW versions.

    > Is it because LZ78 has no advantage over LZ77, in compression ratio and/or speeds?

    Technically LZ78 includes LZ77 (I mean classes of compression algorithms, not original methods) -
    we can say that {token_type;distance;length} is just a method of dictionary index generation.

    Also there's a whole spectrum of intermediate algorithms,
    which attempt to reduce LZ77 redundancy (assigning many different codes to same strings):
    LZFG,ACB,LZP,ROLZ,even PPM can be considered one (ROLZ with unary length coding).

    I do think that LZ78 has potential to be stronger than LZ77 - we can even see that with GLZA.
    It may be not realized yet, but we cannot force people to work on it.

    Again, thx for great info. LZ78's dictionary is more global than LZ77's, so it might have potential to have better compression ratio. But of course, I am looking at speeds too. As of decompression speed, we certainly want as high as possible. In quite a lot applications, though, since decompression speed is much higher than compression speed, and if compression ( write ) and decompression ( read ) happen equally often, i.e., more symmetric, then decompression speed usually isn't a bottleneck, so a bit lower decompression speed is a good tradeoff for higher compression ratio. Now that LZW's patent should have expired ( am I correct? ), people may be able to start to re-examine it again. Of course, those are just my random thoughts. Thanks again.

  10. #8
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    279
    Thanks
    117
    Thanked 161 Times in 118 Posts
    Quote Originally Posted by Shelwien View Post
    I do think that LZ78 has potential to be stronger than LZ77 - we can even see that with GLZA.
    It may be not realized yet, but we cannot force people to work on it.
    I'm working veeery slowly on LZW, my goal is to improve the compression ratio ~regardless of speed.
    I hope to reach some good point (and keep the source code clean to publish it).


    Quote Originally Posted by smjohn1 View Post
    I am looking at speeds too. [...] so a bit lower decompression speed is a good tradeoff for higher compression ratio.
    My LZW version is symmetrical and decompression will be about as slow as the compression, I need to build the dictionary the same for both.
    I think it won't be as fast as other LZ77 implementations.

  11. Thanks:

    smjohn1 (30th October 2020)

  12. #9
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    103
    Thanks
    41
    Thanked 8 Times in 8 Posts
    Quote Originally Posted by Mauro Vezzosi View Post
    I'm working veeery slowly on LZW, my goal is to improve the compression ratio ~regardless of speed.
    I hope to reach some good point (and keep the source code clean to publish it).



    My LZW version is symmetrical and decompression will be about as slow as the compression, I need to build the dictionary the same for both.
    I think it won't be as fast as other LZ77 implementations.
    Nice to know there is effort in this direction. With more memory to use, a big global dictionary seems to be able to improve compression ratio (significantly).

  13. #10
    Member
    Join Date
    Dec 2012
    Location
    japan
    Posts
    169
    Thanks
    31
    Thanked 71 Times in 44 Posts
    seclzw is interesting.

  14. Thanks (4):

    anormal (2nd November 2020),Dresdenboy (31st October 2020),Shelwien (30th October 2020),smjohn1 (30th October 2020)

  15. #11
    Member
    Join Date
    May 2015
    Location
    Italy
    Posts
    57
    Thanks
    0
    Thanked 14 Times in 10 Posts
    I made an effort to approach the compression ratio of the Deflate algorithm (LZ77 + entropy stage), but I failed.
    Anyway you can look to my sources at
    https://encode.su/threads/2182-Group...LZW-compressor

  16. Thanks (2):

    Dresdenboy (31st October 2020),smjohn1 (30th October 2020)

Similar Threads

  1. Shortest LZW decompressor?
    By leob in forum Data Compression
    Replies: 0
    Last Post: 22nd May 2016, 08:38
  2. Improving LZ78/LZW?
    By RichSelian in forum Data Compression
    Replies: 7
    Last Post: 19th September 2011, 19:05
  3. LZW v0.2 is here!
    By encode in forum Forum Archive
    Replies: 6
    Last Post: 8th February 2008, 23:53
  4. LZW v0.1 is here!
    By encode in forum Forum Archive
    Replies: 20
    Last Post: 2nd February 2008, 14:46
  5. New LZW variant
    By encode in forum Forum Archive
    Replies: 14
    Last Post: 28th January 2008, 22:33

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •