E2160 @ 9x360=3.24, DDR2-800 5-5-5-18 @ 900
http://shelwien.googlepages.com/cm200805.htm
Also here're a windows port of m01 and "fpaq0f2x" source.
E2160 @ 9x360=3.24, DDR2-800 5-5-5-18 @ 900
http://shelwien.googlepages.com/cm200805.htm
Also here're a windows port of m01 and "fpaq0f2x" source.
Last edited by Shelwien; 30th May 2008 at 03:43.
Thanks for testing.
Now, here's something exclusive: http://shelwien.googlepages.com/cm200505-iphone.htm
Quite the different metric ranking...
Well, no wonder since the damn thing has only 16k read cache,
no write cache, and really slow multiplications (via FPU, even integer).
Interesting test!Thanks Shelwien!
![]()
http://shelwien.googlepages.com/cm200806.htm
fcm2a added (when'd you learn to properly compile things yourself already?
o2_xx SSE2 coders retested and added (from sse.htm)
fcm2sh4 added
- you have too much different scales there. it slows things down.
- managed to get better results than fcm2, but guess that my mixer
really doesn't like the nonlinear counters. Also note that its significantly
better on most of files but loses too much on fp.log (same goes for
m01 vs o2_xx btw). Guess I'd have to do something special about
determinated contexts.
OK, thanks! Actually, indeed better use different update rates for lower and higher orders. Might be this gives the most compression difference. Also it was specially tuned... Mine wasn't. Still not really beleive in your shamanic weight update. Interesting how it will work with an adidtional context - as FCM2a does.
Translating to human language your update:
If prediction difference between p1 and p2 achieves some threshold (below or above) correct the weight:
__min/__max just set bounds of the weight
OK, division by SCALE will be replaced by shift by the compiler, but division by (p2-p1) will kept as is.
((SCALE-wr)*w+SCALE-1)/SCALE + wr*(P-p1)/(p2-p1) )
At the first sight it's computationally expensive. However, according to your timings it works pretty fast. Will test such thing... But at the moment I like mine...![]()
Your "fcm2sh4_xi_ipo_prof-3" is quick.That would be quite a challenge for my optimised compiles.
![]()
Thats an exponentially smoothed (1-wr, wr) version between the old weight and the solution of a counter back-propagation, damn, i've rewritten it 3 times((SCALE-wr)*w+SCALE-1)/SCALE + wr*(P-p1)/(p2-p1) )![]()
Last edited by toffer; 2nd June 2008 at 21:49.
> indeed better use different update rates for lower and higher orders.
Also better not use the power-of-2 speeds and noisy += updates.
> Might be this gives the most compression difference.
I don't think so as a counter with shift 2 is not really a counter.
Probably replacing your counter pair with a single counter of mine would be benificial
for this coder.
> Also it was specially tuned...
Now, after I really tuned it right (with the sum of separately compressed files),
it shows 24171068.
> Mine wasn't.
Algorithm selection is tuning too
> Still not really beleive in your shamanic weight update.
This one is not shamanic actually.
It just tries to update the mixed probability the same way
as if it was a counter (ie remixing after mixer update would
result in updated probability value).
So if you won't update the original counters, then this mixer's output
would behave exactly as if it was a single counter.
> Interesting how it will work with an adidtional context - as FCM2a does.
I would probably always be able to beat your results after tuning various things,
of course only if the coder is simple enough.
> If prediction difference between p1 and p2 achieves some threshold
> (below or above) correct the weight: __min/__max just set bounds of the weight
Both |p1-p2| check and clipping are only necessary because of imprecise integer calculation there.
> OK, division by SCALE will be replaced by shift by the compiler,
> but division by (p2-p1) will kept as is.
It actually can be replaced with table lookup and multiplication.
And also its skipped most of the time because of similar probability values.
> But at the moment I like mine...
Well, as I said, I did use something like that too.
But this mixer is universal, while "yours" won't be good in different circumstances.
Last edited by Shelwien; 2nd June 2008 at 22:15.
Exactly! Mixing just two probabilities with known nature is what we have here. Mostly we know that order-1/2 is more precise than just order-0. It's not true only at the beginning of encoding - while order-1/2 not gather needed amount of statistics, or in some extreme cases. Also, it's mainly true for such simple encoders like FCM2, and especially on text files. In different situations like LZ-output coding, the things are slightly different. However, for simple tasks we should use simple things...![]()
http://shelwien.googlepages.com/cm200806.htm updated.
- m012 v2 added (as m012a)
- fcm2sh5 added
@toffer: Great. 2M less and you'll reach ppmonstr's results with o2
(well, his is unfair o2 as it uses extra contexts in SSE and
also there's a sparse o1 submodel
@encode: These are new constants for the fcm2sh4->sh5:
static const int FCM2sh4_s1r1 = 1;
static const int FCM2sh4_s1r2 = 3;
static const int FCM2sh4_s2r1 = 2;
static const int FCM2sh4_s2r2 = 5;
static const int FCM2sh4_mlim = 6127;
static const int FCM2sh4_mwr = 116;
static const int FCM2sh4_mmw = 0;
@LP: Looks like a hit this time (though I tested only one build).
Thanks Shelwien!![]()
Now, here's something troublesome to test...
http://shelwien.googlepages.com/st2rc_v0.rar
http://shelwien.googlepages.com/st2rc_v0.htm
Here's more:
http://shelwien.googlepages.com/st2rc_v1.rar
http://shelwien.googlepages.com/st2rc_v1.htm
Any comments?
Btw, making this, I remembered about distance coding.
So, here's a question: did you try LZ with _forward_ references?
Like "copy the next string of specified length to specified offset"?
Because distance coding is known to work better than plain MTF,
and MTF is a LZ without match lengths.
ROLZ or not doesn't matter.
If you encode backward references, then you can encode forward references as well.
No.
With MtF you store the number of different symbols since last occurence of given symbol.
And with DC you store an offset to the next occurence of given symbol.
Well, usually the DC offset is also masked by not counting the already referenced cells...
which is the same as subtracting the MtF rank of symbol at the target location.
http://shelwien.googlepages.com/st2rc_v2.rar
http://shelwien.googlepages.com/st2rc_v2.htm
- compression improved by switching to order1 model of MTF output
- tested the effect of disabling the fpaq0pv4B "macrofication" preprocessor
- tested the rc_mm (original rangecoder from fpaq0p) vs rc_sh1m
Now v3 has better compression than toffer's m012a with 3x faster speed.
http://shelwien.googlepages.com/st2rc_v3.rar
http://shelwien.googlepages.com/st2rc_v3.htm
- adaptive mixing of order0 & order1 added
- MTF disabled as it hurts compression %-)
- E8/E9 filter added
- temporary switch to rc_mm as rc_sh2d produced 200k worse results
(due to simplified renormalization)
And how about ST4?![]()
Goals of this experiment are:
1. Investigation of context histories' properties
(And showing to toffer that order0 modelling is not enough for context histories)
2. rc_mm vs rc_sh comparison
3. counters comparison (Shift-counters vs Node2i for now, may include
delayed counters later)
4. binary vs unary encoding
5. mixing and SSE
So I'm not sure about ST4. There's not enough address space to do it straight,
and I don't have any particularly interesting ideas about its implementation.
Although maybe we can use Matt's bbb instead.
Next plans:
1. Investingating the reason of rc_sh redundancy (long sequencies of improbable
values are clearly wrong)
2. Unary/binary symbol coding
3. SSE
For my next CM experiments i plan to do unary coding - you convinced me of the speed and compression gain (and as i said, i did a static variable length code decomposition, which was notably faster than straight flat decomposition).
One clearly needs some PPM data structures. Could you tell me any references?
Well, I only said that to attract attention. But its true too
And the result without E8 is included for reference. Its worse than yours, but not that far.
Also of course E8 is unfair, but ST2 is unfair too - the coder has to
process the same data type changes 64k times.
Btw that's why I'm not particularly interested in ST4 or BWT for this.
Last edited by Shelwien; 6th June 2008 at 14:57.
> For my next CM experiments i plan to do unary coding
That's great.
There're also things to check out in that area.
Like, all the known implementations use MtF ranking,
but it should be best to design some fast model that
would assign rank0 to a really most probable symbol.
> One clearly needs some PPM data structures.
There's not much choice actually.
Well, you can keep the pointers to context nodes in a hash,
but there's no reason as it would be faster with a tree.
And same way, you can link the symbol nodes into a list,
but its obviously faster to use arrays.
Although you can choose if you really want a suffix tree
(which contains a link to "bc" node in "abc" node) or not,
but it only has sense if you're going to make a PPM, which
would mostly encode all the symbols in highest order.
Also I recently think that it should be viewed as a caching system,
because any necessary statistics can be dynamically calculated,
given the data, and then you can cache the frequently accessed
statistics.
> Could you tell me any references?
http://compression.ru/download/artic..._ppmii_pdf.rar
http://en.wikipedia.org/wiki/Suffix_tree
http://www.compression.ru/ds/ppmdj1.rar (SubAlloc.hpp)
http://www.compression.ru/ds/ppmsj.rar
http://compression.ru/sh/ppmy_3c_sse.rar (ctx_tree.inc)
http://compression.ru/sh/ash04.rar (ash_tree.inc)
Thanks!
How may orders do you mix at once for a unary guess? Something around three should be ok? I'll try anyway, but i need a point to start.
By accident i found a way to estimate R0 symbols with a hit rate of 97-99% (!!!) on all files expect acrord32.exe (here it was 92%). I guess i'll use this heuristic. This technique could be applied to around 20% of the data.
Something about your ST: do you encode the symbol streams *per order 2 context* (after ST) using order 10 mixing? This clearly is something > order 2.
Last edited by toffer; 6th June 2008 at 18:37.