Got an idea for CM coder yesterday - again with compressed data structures basically.

There're currently two incompatible methods -

bytewise coders like ppmd, which use trees for statistics, and

bitwise coders like paq with hashtables and bitwise statistics.

And there're many benefits in each method,

but they don't combine - binary tree would take too much memory,

and dynamic symbol ranking like in ppmd doesn't work with probability-based statistics.

But now i've got a workaround - use fixed binary trees like in bitwise coders

but store them in compressed form, without empty nodes.

This allows to use some benefits of bytewise coders with bitwise.

For example, a coder like ppmd can be given 1G for encoding,

but if it uses only 10M (small file or something), then it can be decoded with 10M,

and it doesn't work like that with hashtables -

if paq compressed a 100 byte file with 1G hashtable,

it won't be able to decode it without the same table.

One problem though, is that i don't know how fast this can be :)

That is, ppmd-like speed is probably possible,

but access to that kind of data structure would be complicated

Well, it would be useful somewhere anyway :)

To be specific, I'm talking about something similar to nibble mask

tree in my "tree" compressor.

Tree nodes there are packed like this:

Latter version is actually more compact with more symbolsCode:Suppose we have a context with '0' x 3, '1' x 2, 'A' x 1 (ie context history is 00011A) Then ppmd's tree symbol table would be like this: 'A':1 '1':2 '0':3 (3*(1+1+4)=18 bytes with byte counters) And the context node in "tree" would be like this: 0000,0000,0001,1000 // 3x,4x enabled 0000,0000,0000,0011 // x0,x1 enabled = 0x30,0x31 0000,0000,0000,0010 // x1 enabled = 0x41 3 2 1 (2+2*2+3*(1+4)=21 bytes)

(ie if we add 5 more symbols 'B'-'F', it would be 18+6*5=48

vs 21+5*5=45)

And its main point is that it allows to locate a symbol

by code without linear scan.

But these masks were inherited from another experiment with

alternative statistical storage structures ("ctx" with layered

context masks), so I didn't notice that it is actually a compressed

data structure - a full frequency table for 256 symbols, where

zero freqs are compressed by disabling their bits in low nibble masks,

and all-zero masks are again compressed by high nibble mask.

Anyway, this same bitmask compression method can be applied to

binary trees as well.

So now I can make a bitwise CM with good counters (linear or fsm

instead of freqs) and a tree with precise statistics

(so more cache-friendly and without extra multiplication per order

for hash functions).