Hi
I've been trying lately to find a simple way to create a dynamic-length offset coding (This could also be usefull for Match Length, and even Literal count). Believing that this was already done by many other contributions, i digged my way into google, with surprisingly few result.
The only result i could find is a mention of Huffman encoding, which surprised me, because Huffman looks a good way to improve "specific values", but less suitable for complete ranges (or there is something that i've not understood then).
In order to illustrate what i'm looking for, here is an example :
Code:Fixed Size Offset Bit.range 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Signal.bits 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Offset.bits 14 14 14 14 14 14 14 14 14 14 14 14 14 14 Total.bits 14 14 14 14 14 14 14 14 14 14 14 14 14 14 Occurence 20 5 5 5 6 10 20 30 40 45 45 35 25 15 Total.bits 280 70 70 70 84 140 280 420 560 630 630 490 350 210 4284 Step 4 Multi range with Huffman header Bit.range 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Signal.bits 3 3 3 2 2 2 2 2 2 1 1 1 3 3 Offset.bits 3 3 3 9 9 9 9 9 9 12 12 12 14 14 Total.bits 6 6 6 11 11 11 11 11 11 13 13 13 17 17 Occurence 20 5 5 5 6 10 20 30 40 45 45 35 25 15 Total.bits 120 30 30 55 66 110 220 330 440 585 585 455 425 255 3706 Range Occ. 20 25 30 5 11 21 41 71 111 45 90 125 25 40
The above example can be computed relatively simply. Huffman is used to select optimal header (Signal.bits), but not for selecting the ranges themselves.
The calculation has also a few inefficiencies. For example, the 1st range uses 3 bits (1-, the 2nd range uses 9 bits, but starts at 9 instead of 1. So the second range is [10-521], and value 520, which is counted in the 3rd range in this calculation (12 bits), will be in the second range instead. This will save a few extra-bits, as a consequence the final number of bits is likely to be better than the one calculated. I doubt that this would change the range selection, but it has to be acknowledged that this effect is not taken into consideration by the algorithm.
Once again, this is just "homework", but i'm sure this has already been extensively tested/researched elsewhere, and just need to turn my head towards the right direction.
Any idea ?