http://jveness.info/software/cts-v1.zip (x86+x64+source)
Its slow and not especially good, comparing to existing coders.
But at least its something new :)
http://jveness.info/software/cts-v1.zip (x86+x64+source)
Its slow and not especially good, comparing to existing coders.
But at least its something new :)
Wanted to test on something more relevant than obj2, but this thingie have eaten ~4GiB RAM on 6Mib file and i killed the beast.Code:Compressing obj2 (246814 bytes) compressor size time work.set ---------- ----- ---- -------- ctw 98378 69 677 facctw 88702 74 1266 cts 80043 9 274 gz 78176 bz2 76441 faccts 72642 10 409 7z 61525 lpaq8 6 57961 0.5 178 fp8 -5 50001 3 92 uda 48483 4 152 paq8px -5 44334 21 170
Don't know why the hell anyone would use this. Tonnes of math, hundreds pages of science - but the result?..
Last edited by nimdamsk; 12th April 2012 at 08:12.
I just had a beer with Joel yesterday
Cheers
M1, CMM and other resources - http://sites.google.com/site/toffer86/ or toffer.tk
Thanks for the mention. Yes, it's very much just a proof of concept implementation. Just to be clear, the paper/code is about comparing the recursive weighting scheme in CTW with the one used in CTS, while keeping similar theoretical gaurentees. Its not meant to be a practical compression tool. A more memory friendly version is of course possible, but that wasn't my goal for the first release. I hope this clarifies things.
Last edited by jveness; 10th April 2012 at 20:53.
I've looked at CTS as end user, not as programmer or researcher. And i wouldn't understand 90% of the papers anyway.
But i am just curious - what is the reason to work on such an ideas? CTW itself never had any comparative good result in benchmarks.
So improving it that much to make it usable... Never the less, as Shelwein said - at least something new.
"Something new" is good enough reason to try. The work seems to show that CTx way is dead way![]()
@jveness:
Anyway, do you have any idea about potential compression improvement for your model?
The pure PPMII (ppmonstr) / CM (ash) results for book1 are about 2.11bpc.
Afair paq8's results are similar too, if one hacks it up to disable "unfair" text models.
The model would still have some uses (like HP) even if its slow, but only if its better than
existing implementations.
To be specific,
ppmonstr is http://compression.ru/ds/ppmdj1.rar (book1=203247 with defaults)
ash is http://www.ctxmodel.net/files/ASH/ash07.rar (book1=203361 with defaults)
or http://www.ctxmodel.net/files/ASH/ash04a.rar (book1=202932 with /s12
(v07 has lower counter precision to reduce memory usage)
@nimdamsk: As a researcher, CTW is interesting since it is one of the few methods that provides really strong theoretical gaurentees. So with CTS, my motivation was to see how far I could push the performance while still keeping good theoretical guarentees. I can understand though if this motivation seems a bit weird.
@Shelwien:
My latest development version of CTS (which is marginally better than the source associated with the paper) gives:
book1, 207727 bytes, 2.16bpc
So it is worse, but maybe it is possible to catch up or do a little better with more work. I do have some more ideas, so I haven't completely given up hope just yet.
Can you tell me what you mean by HP? Sorry, I am unfamiliar with the acronym.
I thought you'd know it, of all people - http://prize.hutter1.net/
207727 is better, but still not good enough, as that's still in the range for BWT compressors
(see http://compressionratings.com/bwt.html#book1)
bwtmix with alphabet reordering also reaches 207130
Btw, also for reference there's http://encode.su/threads/541-Simple-...xt-mixing-demo
It gives about 214k, and I think that any real CM/PPM/BWT should be better than that.
And as to "more ideas", are you aware of the secondary estimation / probability mapping idea?
Its the main difference between ppmonstr and ppmd, and ash's result with disabled SSE is 205417.
Re: Hutter prize: Oh, yeah, that.
Re: secondary estimation / probability mapping
I think Chris told me about this idea a couple days ago. Is this the idea where you maintain some kind of 2d table, which you use to improve the output from your model by incorporating information from a match model?
By the way, I very much appreciate these suggestions.
Last edited by jveness; 13th April 2012 at 06:12.
> Is this the idea where you maintain some kind of 2d table, which you
> use to improve the output from your model by incorporating
> information from a match model?
Yes, but that's not the point at all.
The main idea is that to improve some probability estimation,
we can collect secondary statistics in context of that probability.
Such components are usually called "SSE" (secondary symbol estimation) -
it was first used by C.Bloom in ppmz as "SEE" (secondary escape estimation),
then extended to all symbols by D.Shkarin in ppmonstr... then I decompiled
ppmonstr and it propagated further :).
So the initial version was like this:
then it was improved by adding linear interpolation:Code:Counter X[Q]; // secondary stats p = 1..SCALE-1; // prediction q = p*Q/SCALE; p' = X[q]; // secondary prediction X[q].Update( bit );
(see bsc_mixer.inc in http://nishi.dreamhosters.com/u/bsc240_qlfc_v2.rar )Code:Counter X[Q+1]; // secondary stats p = 1..SCALE-1; // prediction q = p*Q/SCALE; w = (p - q*SCALE)/Q; p' = w*X[q+0] + (1-w)*X[q+1]; // secondary prediction X[q+0].Update( bit ); X[q+1].Update( bit );
then by "indirect update"
(update the mixed value as a counter, then back-propagate the difference)
Of course, its helpful to use extra context for secondary statistics -Code:Counter X[Q+1]; // secondary stats p = 1..SCALE-1; // prediction q = p*Q/SCALE; w = (p - q*SCALE)/Q; p' = w*X[q+0] + (1-w)*X[q+1]; // secondary prediction Counter tmp = p'; tmp.Update(bit); d = tmp - p'; X[q+0] += d; X[q+1] += d;
its possible to add things to secondary context where it won't fit
as primary, mainly because probability mappings can be initialized
as "identity transformation".
Same way, its possible to extend the approach to multiple dimensions
and mix submodel predictions with it.
The results of that are fairly good, but its relatively slow.
(see http://encode.su/threads/1158-SSE2(o...ll=1#post22841 )
The most important point is that SSE can at the same time capture estimation errors
(eg. learn to map prior estimations to posterior) _and_ actual data correlations
(specific patterns in context history) - but the latter only works with static
functions of histories. Its important for understanding why linear_counter+SSE
compresses better than adaptive_rate_counter+SSE, even though standalone
adaptive_rate_counter is significantly better than linear_counter
(also why its rarely a good idea to apply SSE multiple times, or after mixing).