# Thread: Looking for best possible compression / dictionary entries for simple text

1. ## Looking for best possible compression / dictionary entries for simple text

Hello all, I am interested in compression algorithm but I am quite a beginner. An application, which I am currently trying to hack/modify/translate, has a fast de-/compression algorithm implemented (which is probably something like LZH/LZ77).
I am able to modify the compression entries, which are used by this algorithm - therefore I want to find the best new possible combination of the compression entries, to save bytes.

The simple content of the text document: (just as example here)
Code:
```This is an example text in line 1.
This is an example text in line 2.
An example text in line 3 is there as well.
And a lot more text will follow in the next lines.```
Technical background:
Each character has a corresponding hex value (table is simple like: 0x00 = A, 0x01 = B, ..., 0x41 = a, ...). We are not going to modify this.
The compression algorithm has a dictionary for the entries. Let's just pretend 60 entries can be created (and they can get addressed starting with 0x80).

Examples:
0x00 (A) + 0x01 (B) can become 0x80 (AB)
0x00 (A) + 0x41 (a) can become 0x81 (Aa)

Now, what also is possible is an entropy/depth like:
0x80 (AB) + 0x41 (a) can become 0x82 (ABa)
and
0x83 (ABa) + 0x80 (AB) can become 0x83 (ABaAB)

The max possible 'depth' is probably 8.

Question:
I want to find the best possible dictionary entries/compression for the text.
The program must be clever enough - since when an entry is created and used, it might be less good as when another combination or depth is used. So many iterations are needed, to find the best combination. Speed is no concern. We just need to do this once and store the new table into the program. The decompression is fast enough with max depth = 8. Text must be handled line for line.

I will add my current simple C# code/approach to this post later. It creates a dictionary and simple counts occurrences (but only with depth = 2 -> =2 chars). It's simple and not much. In fact I hope that someone can guide me to the right direction or code examples, or even better help with the coding.

Code:

Thank you very much.

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•