Results 1 to 2 of 2

Thread: Looking for best possible compression / dictionary entries for simple text

  1. #1
    Member
    Join Date
    Jan 2017
    Location
    USA
    Posts
    11
    Thanks
    0
    Thanked 1 Time in 1 Post

    Looking for best possible compression / dictionary entries for simple text

    Hello all, I am interested in compression algorithm but I am quite a beginner. An application, which I am currently trying to hack/modify/translate, has a fast de-/compression algorithm implemented (which is probably something like LZH/LZ77).
    I am able to modify the compression entries, which are used by this algorithm - therefore I want to find the best new possible combination of the compression entries, to save bytes.

    The simple content of the text document: (just as example here)
    Code:
    This is an example text in line 1.
    This is an example text in line 2.
    An example text in line 3 is there as well.
    And a lot more text will follow in the next lines.
    Technical background:
    Each character has a corresponding hex value (table is simple like: 0x00 = A, 0x01 = B, ..., 0x41 = a, ...). We are not going to modify this.
    The compression algorithm has a dictionary for the entries. Let's just pretend 60 entries can be created (and they can get addressed starting with 0x80).

    Examples:
    0x00 (A) + 0x01 (B) can become 0x80 (AB)
    0x00 (A) + 0x41 (a) can become 0x81 (Aa)

    Now, what also is possible is an entropy/depth like:
    0x80 (AB) + 0x41 (a) can become 0x82 (ABa)
    and
    0x83 (ABa) + 0x80 (AB) can become 0x83 (ABaAB)

    The max possible 'depth' is probably 8.


    Question:
    I want to find the best possible dictionary entries/compression for the text.
    The program must be clever enough - since when an entry is created and used, it might be less good as when another combination or depth is used. So many iterations are needed, to find the best combination. Speed is no concern. We just need to do this once and store the new table into the program. The decompression is fast enough with max depth = 8. Text must be handled line for line.

    I will add my current simple C# code/approach to this post later. It creates a dictionary and simple counts occurrences (but only with depth = 2 -> =2 chars). It's simple and not much. In fact I hope that someone can guide me to the right direction or code examples, or even better help with the coding.

    Code:
    
    
    Thank you very much.
    Last edited by SnakeSnoke; 15th July 2018 at 00:52.

  2. #2

Similar Threads

  1. Self-Describe Dictionary Compression
    By JIMLYN in forum Data Compression
    Replies: 2
    Last Post: 13th April 2018, 13:38
  2. text compression?
    By codebox in forum The Off-Topic Lounge
    Replies: 2
    Last Post: 16th March 2015, 16:31
  3. Small dictionary prepreprocessing for text files
    By Matt Mahoney in forum Data Compression
    Replies: 40
    Last Post: 23rd June 2011, 06:04
  4. can zip directory entries share file entries?
    By willvarfar in forum Data Compression
    Replies: 1
    Last Post: 23rd February 2011, 14:46
  5. Dummy Static [Windowless] Dictionary Text Decompressor
    By Sanmayce in forum Data Compression
    Replies: 4
    Last Post: 12th October 2010, 18:55

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •