It's not the same situation with BWT, since in a "lookup-table-manner" (lookup the contexts) the ranks change "over time" (file/buffer position). You would have to modify this:
c_i-1 | r0_i-1
c_i | r0_i <- this is the currently viewed order 6 context c_i
c_i+1 | r0_i+1
When knowing where c_i is located in the input data probe backwards and see, if all truncated order 5-2 contexts which occur first have the same R0 character.
I modified CMM4 to output the counts again (note: i use hashing, so the true number must be higher due to unresolved collisions):
Code:
G:\projects\cmm4-080527\bin\Release>cmm4 43 f:\testset\A10.jpg f:\testset\A10.jp
g.tst17
CMM4 v0.1f by C. Mattern Jun 8 2008
Experimental file compressor.
Init: Order6,4-0 context mixing coder.
Allocated 116494 kB.
Encoding: done.
Ratio: 828527/842468 bytes (7.87 bpc)
Speed: 278 kB/s (3505.2 ns/byte)
Time: 2.95 s
85/92 (92.39%)
G:\projects\cmm4-080527\bin\Release>cmm4 43 f:\testset\AcroRd32.exe f:\testset\A
croRd32.exe.tst17
CMM4 v0.1f by C. Mattern Jun 8 2008
Experimental file compressor.
Init: Order6,4-0 context mixing coder.
Allocated 116494 kB.
Applying x86 transform.
Encoding: done.
Ratio: 1184968/3870784 bytes (2.45 bpc)
Speed: 369 kB/s (2640.0 ns/byte)
Time: 10.22 s
814190/840443 (96.88%)
G:\projects\cmm4-080527\bin\Release>cmm4 43 f:\testset\english.dic f:\testset\en
glish.dic.tst17
CMM4 v0.1f by C. Mattern Jun 8 2008
Experimental file compressor.
Init: Order6,4-0 context mixing coder.
Allocated 116494 kB.
Encoding: done.
Ratio: 451105/4067439 bytes (0.89 bpc)
Speed: 479 kB/s (2035.9 ns/byte)
Time: 8.28 s
1661250/1753468 (94.74%)
G:\projects\cmm4-080527\bin\Release>cmm4 43 f:\testset\FlashMX.pdf f:\testset\Fl
ashMX.pdf.tst17
CMM4 v0.1f by C. Mattern Jun 8 2008
Experimental file compressor.
Init: Order6,4-0 context mixing coder.
Allocated 116494 kB.
Encoding: done.
Ratio: 3650013/4526946 bytes (6.45 bpc)
Speed: 303 kB/s (3220.3 ns/byte)
Time: 14.58 s
281184/287437 (97.82%)
G:\projects\cmm4-080527\bin\Release>cmm4 43 f:\testset\FP.LOG f:\testset\FP.LOG.
tst17
CMM4 v0.1f by C. Mattern Jun 8 2008
Experimental file compressor.
Init: Order6,4-0 context mixing coder.
Allocated 116494 kB.
Encoding: done.
Ratio: 426899/20617071 bytes (0.17 bpc)
Speed: 488 kB/s (1999.3 ns/byte)
Time: 41.22 s
13953959/14063791 (99.22%)
G:\projects\cmm4-080527\bin\Release>cmm4 43 f:\testset\MSO97.DLL f:\testset\MSO9
7.DLL.tst17
CMM4 v0.1f by C. Mattern Jun 8 2008
Experimental file compressor.
Init: Order6,4-0 context mixing coder.
Allocated 116494 kB.
Applying x86 transform.
Encoding: done.
Ratio: 1596934/3782416 bytes (3.38 bpc)
Speed: 288 kB/s (3383.3 ns/byte)
Time: 12.80 s
335236/350267 (95.71%)
G:\projects\cmm4-080527\bin\Release>cmm4 43 f:\testset\ohs.doc f:\testset\ohs.do
c.tst17
CMM4 v0.1f by C. Mattern Jun 8 2008
Experimental file compressor.
Init: Order6,4-0 context mixing coder.
Allocated 116494 kB.
Encoding: done.
Ratio: 748289/4168192 bytes (1.44 bpc)
Speed: 396 kB/s (2462.7 ns/byte)
Time: 10.27 s
2221984/2248349 (98.83%)
G:\projects\cmm4-080527\bin\Release>cmm4 43 f:\testset\rafale.bmp f:\testset\raf
ale.bmp.tst17
CMM4 v0.1f by C. Mattern Jun 8 2008
Experimental file compressor.
Init: Order6,4-0 context mixing coder.
Allocated 116494 kB.
Encoding: done.
Ratio: 742157/4149414 bytes (1.43 bpc)
Speed: 349 kB/s (2794.1 ns/byte)
Time: 11.59 s
925509/1051685 (88.00%)
G:\projects\cmm4-080527\bin\Release>cmm4 43 f:\testset\vcfiu.hlp f:\testset\vcfi
u.hlp.tst17
CMM4 v0.1f by C. Mattern Jun 8 2008
Experimental file compressor.
Init: Order6,4-0 context mixing coder.
Allocated 116494 kB.
Encoding: done.
Ratio: 511397/4121418 bytes (0.99 bpc)
Speed: 365 kB/s (2669.0 ns/byte)
Time: 11.00 s
1589466/1631281 (97.44%)
G:\projects\cmm4-080527\bin\Release>cmm4 43 f:\testset\world95.txt f:\testset\wo
rld95.txt.tst17
CMM4 v0.1f by C. Mattern Jun 8 2008
Experimental file compressor.
Init: Order6,4-0 context mixing coder.
Allocated 116494 kB.
Encoding: done.
Ratio: 455580/2988578 bytes (1.22 bpc)
Speed: 347 kB/s (2807.7 ns/byte)
Time: 8.39 s
621905/642724 (96.76%)
Looks like the numbers i've written down from memory were wrong - these are correct
. And what i did modified is:
Code:
...
rzCtx = 4*rzCtx + 2*
(rzQueues[0]->Match(rzQueues[1]->Char())&
rzQueues[0]->Match(rzQueues[2]->Char())&
rzQueues[0]->Match(rzQueues[3]->Char()));
r0Guess = (rzCtx&2)!=0 && rzQueues[3]->Hits()>1;
...
There must be more than a single hit, this i forgot, too. And after knowing the coded character c:
Code:
...
r0Hits += r0Guess && (rzQueues[3]->Match(c));
r0Count += r0Guess;
...
In my experience order 0 mainly helps on binary data (mostly x86 data). You are still talking about the results with an e8e9 filter?
Your implementation does the (inverse) transform _always_ forward within a buffer?
As i said i was surprised that higher orders help at all. After some thoughts not that much - at least for only an order 2 context sort, since the separation isn't as strong as BWT (order is limited only by the memory).