Hi all,
in this episode I am glad to tell about an attempt of my own to dispense
the problem previously encountered in Grouped (ROLZ) LZW:
the fixed size of the groups (dictionaries attached to context).
A way to proceed is illustrated by Nakano, Yahagi, Okada but I started
from a different consideration.
Every time a symbol occurs in a text, it gains an increasingly number of
children and the chance for it to reappear is more and more smaller,
while an entropy stage which takes its output assigns shorter codes. To
arrange this contrast I settled down a schema where symbols
belong to a set of lists entitled for the number of children, and each
list is organized as LRU.
A symbol then will be emitted by its list and rank inside it,
respectively via the Witten Cleary arithmetic coder and the Elias delta.
I choosed an AC because it is the solely that can mimic closely the
fatherhood distribution among symbols, but this premise put me in front
of the necessity to interleave its sequence.
After a complicated period I realized that the solution must be based on
two facts: (i) the encoder has to be ahead of two symbols because the
decoder needs to start with 16 bit; (ii) the variables high and low (which
define the focus interval) are in lockstep between the just mentioned two
sides. To cope with these constraints I decoupled the LZW part from the
entropy one by a queue data interface, registering the number of bit
elaborated.
At the moment the compressor is terrible, both in terms of
speed and ratio, but I made it public as the interleaver could be of
some interest.
I have in mind to improve the performance of the compressor imposing on
it a more canonical context apparatus, that should curtail the recency lists
at the expense of memory consumption.
I hope to be back soon,
greetings, Marco Borsari