Results 1 to 5 of 5

Thread: (LZW variant) LZOOM compression idea and help

  1. #1
    Member
    Join Date
    Jul 2017
    Location
    earth
    Posts
    2
    Thanks
    2
    Thanked 11 Times in 1 Post

    Smile (LZW variant) LZOOM compression idea and help

    Hello. I'm just starting with compression and I want to ask for help. I've learned and implemented LZW and created a variant but I don't know if it's good and I wasn't able to implement it yet.

    First of all getting started...

    LZW-encode:
    i = read-longest-match(stream)
    emit i
    note dict[i] ++ peek(stream)


    this is normal LZW where the dictionary gets added to immediately. For decoding the dictionary is one step ahead so you need that exceptional case to handle the case where the newly added dictionary word is used right away. (know what I mean?)

    I wanted to get rid of that special case, so I delayed adding to the dictionary by one step:
    LZW-slow:
    i = read-longest-match(stream)
    emit i
    note dict[j] ++ head(dict[i]) // except first iteration
    j = i

    (j is the previous iterations dictionary index)


    I tried this change and it worked fine.

    Now the issue with LZW

    If you compress the string "AAAAA.....AAAA" it will add to dictionary AA, AAA, AAAA, AAAAA (+1 each time) and the result is the file will be compressed to square root size.

    What I want: Adds to the dictionary A, AA, AAA, AAAAA, AAAAAAAA, ... (almost double each time). This should compress a long repeated string to log size. Much better compression.

    LZOOM:
    i = read-longest-match(stream)
    emit i
    note dict[j] ++ dict[i]
    j = i


    Questions


    • What do you think about this variant?
    • Have you heard of this variant before? (or is it new?)
    • Do you think it can give good compression? (I am worried it will be too "scattered")
    • I tried to implement it but my code ran really slow and crashed, don't know what is wrong

  2. #2
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    869
    Thanks
    470
    Thanked 261 Times in 108 Posts
    A known variant of LZW is that, instead of creating new entries with the rule :
    newEntry = existingEntry + character
    you could do instead :
    newEntry = existingEntry1 + existingEntry2

    Since characters are part of the initial entry list, this new rule is a superset of the previous one.
    Also, if existingEntry1==existingEntry2, you get the doubling you are looking for.

    This variant creates longer entries faster.
    That doesn't mean it's always better.
    Sure, it beats the simpler version for "AAAA.....AA" corner case,
    but for more usual distributions, that's less clear, and usually whatever difference there is remains small.

  3. Thanks:

    r01 (5th July 2017)

  4. #3
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    Quote Originally Posted by r01 View Post
    • I tried to implement it but my code ran really slow and crashed, don't know what is wrong
    You probably ran out of memory (OOM).

    Sorry, it had to be said
    Last edited by nemequ; 6th July 2017 at 11:00.

  5. #4
    Member
    Join Date
    Dec 2012
    Location
    japan
    Posts
    158
    Thanks
    30
    Thanked 61 Times in 37 Posts
    If you compress the string "AAAAA.....AAAA" it will add to dictionary AA, AAA, AAAA, AAAAA (+1 each time) and the result is the file will be compressed to square root size.
    http://web.archive.org/web/200312242...rc/SQUEEZE.LZH
    this codec works fine

  6. #5
    Member
    Join Date
    May 2015
    Location
    Italy
    Posts
    51
    Thanks
    0
    Thanked 12 Times in 9 Posts
    The LZW variant on which you (r01) are working, and Cyan correctly describes, it is known as Miller-Wengman.
    The variant of the source in #4 it is the Storer's All Prefixes, it adds to the dictionary as follows:

    A(first word, in this case a character)|BCD(next word)
    New entries: AB,ABC,ABCD

    Please note because the parser needs to know the second word before to start
    to add to the dictionary, encoder and decoder are always in lock-step and there's
    no need to handle the LZW special case.

Similar Threads

  1. Idea: Combine Compression & Encryption
    By dirks in forum Data Compression
    Replies: 16
    Last Post: 22nd February 2010, 11:49
  2. Idea for raising compression efficiency on disk images
    By Mexxi in forum Data Compression
    Replies: 10
    Last Post: 18th February 2010, 05:56
  3. Compression idea: Base conversion
    By Nightgunner5 in forum Data Compression
    Replies: 8
    Last Post: 30th October 2009, 08:58
  4. Idea to make new site about data compression
    By Piotr Tarsa in forum Data Compression
    Replies: 1
    Last Post: 14th August 2009, 21:22
  5. New LZW variant
    By encode in forum Forum Archive
    Replies: 14
    Last Post: 28th January 2008, 22:33

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •