Hi all, this is the first time I write on this list. I would like to present Glue, a compressor I had wrote:
it combines the LZW method with a prediction on a sorted dictionary, what on LZSS it is termed ROLZ.
The source code in Pascal could be found at
http://digilander.libero.it/borsa77
under the link Lempel-Ziv arc. The executable is compiled with the FPC for the DOS and in the same archive
there's the old work of Okumura and Yoshizaki.
As stated by Ross Williams in the explanatory paper for his LZRW4, the dictionary is splitted into 256 groups
regulated by the word initial character, where each group has a size of 255 items; this is aimed to reduce pointers
size and the actual group prediction is attempted for the orders 1-0.
In the order 1 for every group a recency rank list is maintained, the coding is performed by Elias gamma. The
list length is quite short and can be set by the constant Span, characters will never be inserted at this
order. There are two possible escape codes to occur: either the symbol is a string or a character. In this case
the order 0 takes care to code them with dynamic Huffman, but omitting the pointer if it is a character.
In the order 1 the probabilities for the escape codes are adjusted with the move-to-front method, while for
symbols a one-step shift is used. A nice feature of the LZW dictionary is that whereas it is managed with a trie,
every item in it holds a successors list which could be used as an esclusion list for the prediction (Horspool).
As shown in the following example
Dict: ab,abc,de
Input: ab|de
Order 1 rank for b: (0)c,(1)d
the code for d could be 0 instead of 1, it should be noted this approach works only with greedy parsing.
The frequencies of Huffman algorithm by Vitter are never scaled down, this global modelling juxstaposed to
the local one imposed by the Span constant could be useful to handle a large spectrum of possible inputs.
Because of lack of correlation between groups, the relative pointers are not coded, so to make up for, when a group
becomes full, the ideal strategy would be to find the least recently used leaf (Storer). Since this requires a dedicated
data structure, I turned instead on the shrink method (PkWare, British Telecom) where all leaves are pruned, with
the little refinement to limit their number to 32 to avoid the waste of one bit in the pointer.
My goal in developing this program was to reach the performance of the deflate algorithm with a comparable dictionary
size, in respect to speed I was resigned by the need of dynamic entropy coder for the PPM, for compression ratio Glue
does worse by a factor of about 10%. I am not satisfied from this result, but I think the program may, at least, be didactic,
in fact for what I know there's no other example of a grouped LZW.
Greetings, Marco Borsari.