CMV compresses 2.txt better than 2*.bin (2.bin bit better than 2a.bin) and the development version is even better.
There seems to be no "killer model", the best result is a mix of almost all models and mixers.
Compress {x;y-x} or compress all x then all y gives worse scores (at least with CMV).

Yes... I'm currently using a kinda-order1 (quantized) model with SSE for it.
As you can see in my stats, its actually better than paq (and I improved it by ~200 bytes since then).
But maybe there's some simple trick I'm missing?
For example, its possible to transform these two values to a 577-bit bitmap with 1-2 bits set.

There's actually an array of 576 MDCT coefs.
But coefs for higher freqs are more likely to be closer to 0.
So instead of encoding all 576 values for all frames, I have special handling for all-0 and 0/1 sections, their sizes are described by these numbers.

Also I have a special flag for =0 for x/y, but when I tried adding a similar flag for 576, it didn't work.
As to 418, its likely file-specific.

Another visualization, looking at the values over "time", showing it is very useful to use some kind of dynamic coding. Y values often are within a 8-bit range or even less for very long, X values are noisier.

Well, my current result is 32782 (with basically single counter table + SSE).
And it seems that paq/cmix don't see any different dependencies that my model doesn't use.

I got 36791 bytes with an o0 I am currently working on. It's worse than your results but maybe it is still helpful somehow. Maybe I'll find time to build an o1 from it in the next days.

Edit:
If I compress x and y independently, I get 21624 for x and 16208 for y.

What I'm actually interested here is finding some obscure contexts, like if (x+y)%5==0 was always true.
I can also squeeze several bytes more out of it by using precise components, also there's always mixing, FSM counters etc.
Some methods could be even overtuned to get it down to 30k or so (by moving the remaining part to model coefs).

But different implementations are also possible, like bitmap which I already posted.
Or maybe something blockwise, with min/max values per block and parsing optimization.

If such a method would provide a way to save even 6-7k, that'd be very interesting already,
because I'm barely getting 1-2k out of large models (400-800k output with the same sample)
by adding logistic mixing and slowing them down to half speed.
While for header data like these offsets its easy to afford using strong/slow models.

P.S. Here's another sample, of different field ("global gain"). My current model compresses it to 10050, paq8px result is 9715, 7z (with delta etc) ~10852.

Edit: Well... I had a copy & paste bug, only compressed each 2nd value. This was looking to good to be true, though. Old data are in the end.

Why is it easier to afford slow models for headers, because headers are smaller? But also the absolute gain will be smaller. So if you invest the same time into implementing better header compression or better data compression the gain in compressed size will be smaller for better header compression.
Or am I not seeing something here?

After some testing I came to the following conclusions:
- the correlation between every 2nd sample seems to be a little stronger than the correlation with the direct neighbor.
- about 10k seems to be the limit for simple models

some results for glgain, interpreted as 8bit uints. propably no new insights for you, just as a correction of the previous results:
- o0: 11992
- delta + o0: 12707
- delta(2 seperate interlaced "streams") + o0: 11253
- adaptive linear filter for prediction: 10322

Old, erroneous data:
Compressing glgain as 8bit uints I get better results:

o0: 5942
delta + o0: 5570

Still no interesting contexts, though. But at least an improvement.

Last edited by Urist McComp; 13th September 2017 at 02:10.

> Why is it easier to afford slow models for headers, because headers are smaller?

Yes, we can use a much stronger model if its called once per frame,
rather than 1152 times per frame.

> But also the absolute gain will be smaller. So if you invest the same time
> into implementing better header compression or better data compression the
> gain in compressed size will be smaller for better header compression.

Sure, that's exactly what I did.
While header models basically stayed the same plain order0 models.

So when I checked the model contibutions recently, it turned out
that some of the header models produce more code than some of DCT coef models.
While also being much easier to improve.

> After some testing I came to the following conclusions:
> - the correlation between every 2nd sample seems to be a little stronger than the correlation with the direct neighbor.

Yes, there're 2 or maybe 4 streams (in glgain.bin):
(4 values per frame, 2 per channel)

Code:

12,272 // 7z a -mx=9 -myx=9 -m0=lzma:mt1:fb273:lc4:lp0:pb0 1.7z glgain.bin
12,163 // 7z a -mx=9 -myx=9 -m0=delta:1 -m1=lzma:mt1:fb273:lc5:lp0:pb1 1.7z glgain.bin
11,997 // 7z a -mx=9 -myx=9 -m0=delta:2 -m1=lzma:mt1:fb273:lc0:lp0:pb1 2.7z glgain.bin

> - about 10k seems to be the limit for simple models

Yes, but its likely possible to push it to 9700-9500 with enough mixing.

> - adaptive linear filter for prediction: 10322

optimfrog - 10199 (LPC)
But paq8px-7 - 9715, so there is some context... actually 9586 using wav model.