Page 2 of 2 FirstFirst 12
Results 31 to 39 of 39

Thread: Comparison of the recent CM demo coders

  1. #31
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,340
    Thanks
    211
    Thanked 1,009 Times in 534 Posts
    > How may orders do you mix at once for a unary guess?
    > Something around three should be ok?
    > I'll try anyway, but i need a point to start.

    Well, I mixed all. I think its also a good starting point.
    Also there's a question of "update exclusion" - a model for
    symbols escaped from orders >N is clearly different from
    a normal order-N.

    > By accident i found a way to estimate R0 symbols with a hit
    > rate of 97-99% (!!!) on all files expect acrord32.exe (here
    > it was 92%).

    Wonder what you tried to say here... because if you're
    really able to guess the next symbol with even 92% probability,
    then you'd have 0.4bpc on SFC. Don't forget about Hutter prize

    Did you mean most probable symbol in context?
    Is that including only context's statistics, or global statistics?

    > Something about your ST: do you encode the symbol streams
    > *per order 2 context* (after ST) using order 10 mixing? This
    > clearly is something > order 2.

    Well, I didn't say anything about it being order2.
    Of course its order3, the same way as Matt's fpaq0f is order2
    (and fpaq0f2x is order3).

    But now you can see the gain from doing so. There're results
    for plain order0, MtF+order0, order1, MtF+order1 in my table,
    and you can see how it affects the prediction quality...
    for different kinds of order2 counters
    Last edited by Shelwien; 6th June 2008 at 22:39.

  2. #32
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,340
    Thanks
    211
    Thanked 1,009 Times in 534 Posts
    Btw, with all the mixing talk, I forgot.
    You don't have to mix anything at all actually.
    Predictions of any submodels are just another context.

  3. #33
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    If the R0 symbols under order N, N-1,..., N-M match, there's a very high chance for the R0 symbol to match the next one. This special condition is fulfilled for orders 6-2 in around 20% of the input data. The chance for a successful match is very high (97-99%).
    So there's an easy way to identify highly redundant data. I'm using this as a SSE context in CMM4 in combination with other similar things.

    Btw, with all the mixing talk, I forgot.
    You don't have to mix anything at all actually.
    Predictions of any submodels are just another context.
    Basically you can call prediction merging what you want... Do you mean contexts for SSE, here?

    Somehow I'm unable to download anything from compression.ru?! Could you include these as attatchments?

    I see that you've listed M012, i thought it was M01
    Last edited by toffer; 7th June 2008 at 14:57.

  4. #34
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,340
    Thanks
    211
    Thanked 1,009 Times in 534 Posts

  5. #35
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,340
    Thanks
    211
    Thanked 1,009 Times in 534 Posts
    > in around 20% of the input data. The chance for a successful match is very high (97-99%).

    Yeah, its really misleading
    But then, your numbers seem to be low instead.
    Eg. here're counts by ranks ( freq(mtf(bwt(book1))) ):
    (total size 768771) 0: 386450, 1:116648, 2: 62043, 3: 41128, 4:28736, 5:20872 [...]
    As you can see, more than 50% are rank0 there.
    Also, here're counts for acrord32:
    (total size 3870784) 0:1956567, 1:283597, 2:160721, 3:111042, 4:84835, 5:67897 [...]

    > > You don't have to mix anything at all actually.
    > > Predictions of any submodels are just another context.

    > Basically you can call prediction merging what you want... Do you mean contexts for SSE, here?

    I mean exactly that you can treat the predictions the same way as any other context.
    Of course, you'd (partly) know the origin of this context so it'd be easier to quantize,
    but then there's not much difference from any PCM (audio data) model or the like.

    > I see that you've listed M012, i thought it was M01

    Btw, my current results are 1850xxxx (st2rc v3 was 18703791) - optimization in progress.
    I mean, the previous results were optimized too, but I'd mixed in another order0 model,
    so now its mix(o0b,mix[ctx](o0a,o1)).
    But paq8 -6 result for the same data (st2+e8 output) is 17256448, so there's surely
    a space for improvement.
    Still wonder how it does that, though... Can order2+ _in ST2 output_ be of any help?..
    Last edited by Shelwien; 8th June 2008 at 00:22.

  6. #36
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    It's not the same situation with BWT, since in a "lookup-table-manner" (lookup the contexts) the ranks change "over time" (file/buffer position). You would have to modify this:

    c_i-1 | r0_i-1
    c_i | r0_i <- this is the currently viewed order 6 context c_i
    c_i+1 | r0_i+1

    When knowing where c_i is located in the input data probe backwards and see, if all truncated order 5-2 contexts which occur first have the same R0 character.

    I modified CMM4 to output the counts again (note: i use hashing, so the true number must be higher due to unresolved collisions):

    Code:
    G:\projects\cmm4-080527\bin\Release>cmm4 43 f:\testset\A10.jpg f:\testset\A10.jp
    g.tst17
    CMM4 v0.1f by C. Mattern  Jun  8 2008
    Experimental file compressor.
    Init: Order6,4-0 context mixing coder.
      Allocated 116494 kB.
    Encoding: done.
      Ratio: 828527/842468 bytes (7.87 bpc)
      Speed: 278 kB/s (3505.2 ns/byte)
      Time: 2.95 s
    
    
    85/92 (92.39%)
    
    G:\projects\cmm4-080527\bin\Release>cmm4 43 f:\testset\AcroRd32.exe f:\testset\A
    croRd32.exe.tst17
    CMM4 v0.1f by C. Mattern  Jun  8 2008
    Experimental file compressor.
    Init: Order6,4-0 context mixing coder.
      Allocated 116494 kB.
      Applying x86 transform.
    Encoding: done.
      Ratio: 1184968/3870784 bytes (2.45 bpc)
      Speed: 369 kB/s (2640.0 ns/byte)
      Time: 10.22 s
    
    
    814190/840443 (96.88%)
    
    G:\projects\cmm4-080527\bin\Release>cmm4 43 f:\testset\english.dic f:\testset\en
    glish.dic.tst17
    CMM4 v0.1f by C. Mattern  Jun  8 2008
    Experimental file compressor.
    Init: Order6,4-0 context mixing coder.
      Allocated 116494 kB.
    Encoding: done.
      Ratio: 451105/4067439 bytes (0.89 bpc)
      Speed: 479 kB/s (2035.9 ns/byte)
      Time: 8.28 s
    
    
    1661250/1753468 (94.74%)
    
    G:\projects\cmm4-080527\bin\Release>cmm4 43 f:\testset\FlashMX.pdf f:\testset\Fl
    ashMX.pdf.tst17
    CMM4 v0.1f by C. Mattern  Jun  8 2008
    Experimental file compressor.
    Init: Order6,4-0 context mixing coder.
      Allocated 116494 kB.
    Encoding: done.
      Ratio: 3650013/4526946 bytes (6.45 bpc)
      Speed: 303 kB/s (3220.3 ns/byte)
      Time: 14.58 s
    
    
    281184/287437 (97.82%)
    
    G:\projects\cmm4-080527\bin\Release>cmm4 43 f:\testset\FP.LOG f:\testset\FP.LOG.
    tst17
    CMM4 v0.1f by C. Mattern  Jun  8 2008
    Experimental file compressor.
    Init: Order6,4-0 context mixing coder.
      Allocated 116494 kB.
    Encoding: done.
      Ratio: 426899/20617071 bytes (0.17 bpc)
      Speed: 488 kB/s (1999.3 ns/byte)
      Time: 41.22 s
    
    
    13953959/14063791 (99.22%)
    
    G:\projects\cmm4-080527\bin\Release>cmm4 43 f:\testset\MSO97.DLL f:\testset\MSO9
    7.DLL.tst17
    CMM4 v0.1f by C. Mattern  Jun  8 2008
    Experimental file compressor.
    Init: Order6,4-0 context mixing coder.
      Allocated 116494 kB.
      Applying x86 transform.
    Encoding: done.
      Ratio: 1596934/3782416 bytes (3.38 bpc)
      Speed: 288 kB/s (3383.3 ns/byte)
      Time: 12.80 s
    
    
    335236/350267 (95.71%)
    
    G:\projects\cmm4-080527\bin\Release>cmm4 43 f:\testset\ohs.doc f:\testset\ohs.do
    c.tst17
    CMM4 v0.1f by C. Mattern  Jun  8 2008
    Experimental file compressor.
    Init: Order6,4-0 context mixing coder.
      Allocated 116494 kB.
    Encoding: done.
      Ratio: 748289/4168192 bytes (1.44 bpc)
      Speed: 396 kB/s (2462.7 ns/byte)
      Time: 10.27 s
    
    
    2221984/2248349 (98.83%)
    
    G:\projects\cmm4-080527\bin\Release>cmm4 43 f:\testset\rafale.bmp f:\testset\raf
    ale.bmp.tst17
    CMM4 v0.1f by C. Mattern  Jun  8 2008
    Experimental file compressor.
    Init: Order6,4-0 context mixing coder.
      Allocated 116494 kB.
    Encoding: done.
      Ratio: 742157/4149414 bytes (1.43 bpc)
      Speed: 349 kB/s (2794.1 ns/byte)
      Time: 11.59 s
    
    
    925509/1051685 (88.00%)
    
    G:\projects\cmm4-080527\bin\Release>cmm4 43 f:\testset\vcfiu.hlp f:\testset\vcfi
    u.hlp.tst17
    CMM4 v0.1f by C. Mattern  Jun  8 2008
    Experimental file compressor.
    Init: Order6,4-0 context mixing coder.
      Allocated 116494 kB.
    Encoding: done.
      Ratio: 511397/4121418 bytes (0.99 bpc)
      Speed: 365 kB/s (2669.0 ns/byte)
      Time: 11.00 s
    
    
    1589466/1631281 (97.44%)
    
    G:\projects\cmm4-080527\bin\Release>cmm4 43 f:\testset\world95.txt f:\testset\wo
    rld95.txt.tst17
    CMM4 v0.1f by C. Mattern  Jun  8 2008
    Experimental file compressor.
    Init: Order6,4-0 context mixing coder.
      Allocated 116494 kB.
    Encoding: done.
      Ratio: 455580/2988578 bytes (1.22 bpc)
      Speed: 347 kB/s (2807.7 ns/byte)
      Time: 8.39 s
    
    
    621905/642724 (96.76%)
    Looks like the numbers i've written down from memory were wrong - these are correct . And what i did modified is:

    Code:
    ...
        rzCtx = 4*rzCtx + 2*
          (rzQueues[0]->Match(rzQueues[1]->Char())&
           rzQueues[0]->Match(rzQueues[2]->Char())&
           rzQueues[0]->Match(rzQueues[3]->Char()));
    
        r0Guess = (rzCtx&2)!=0 && rzQueues[3]->Hits()>1;
    ...
    There must be more than a single hit, this i forgot, too. And after knowing the coded character c:

    Code:
        ...
        r0Hits += r0Guess && (rzQueues[3]->Match(c));
        r0Count += r0Guess;
        ...
    In my experience order 0 mainly helps on binary data (mostly x86 data). You are still talking about the results with an e8e9 filter?
    Your implementation does the (inverse) transform _always_ forward within a buffer?

    As i said i was surprised that higher orders help at all. After some thoughts not that much - at least for only an order 2 context sort, since the separation isn't as strong as BWT (order is limited only by the memory).

  7. #37
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,340
    Thanks
    211
    Thanked 1,009 Times in 534 Posts
    > It's not the same situation with BWT, since in a
    > "lookup-table-manner" (lookup the contexts) the ranks change
    > "over time" (file/buffer position).

    Maybe I misunderstand something again, but once I made a straight
    symbol ranking transform (without entropy coding part), and learned
    that it commonly has very similar ranks distrubution to that of MTF(BWT).

    > (rzQueues[0]->Match(rzQueues[1]->Char())&
    > rzQueues[0]->Match(rzQueues[2]->Char())&

    This reminded me of ppmonstr... and my own ash too -
    there're things like that in SSE contexts.

    > In my experience order 0 mainly helps on binary data (mostly x86 data).

    I already said that order0 isn't very important, except for compression of small files.
    (Not using it might cause some constant redundancy, like 300 bytes per any file).

    And executables are not really binary... they're more like containers of different
    filetypes - like tables and images and text... so you need to have some models for
    all of these and precisely mix or switch them. And x86 code has known syntax and patterns
    which may help to highly improve its compression comparing to a "universal" model.

    > You are still talking about the results with an e8e9 filter?

    Well, yes, since I already implemented it there.

    > Your implementation does the (inverse) transform _always_ forward within a buffer?

    Yes, check out the ST2.inc

    > As i said i was surprised that higher orders help at all.

    Well, I was more surprised that MTF(MTF) turned out bad
    (I tried it after seeing long runs of 01 values in fp.log mtf output).
    And also both my CM and paq8 seem to compress the direct data
    significantly better... my results on ST2 output are actually
    better than paq8's results for MTF(ST2)

    > After some thoughts not that much - at least for only an order 2 context sort,

    Well, I knew that there're correlations like that, but kinda
    hoped that paq8 is inadaptive enough so I could stop there

    > since the separation isn't as strong as BWT (order is limited only by the memory).

    Try compressing the BWT output with different orders (ppmonstr or my compressors
    allow to specify that) - guess you'd be even more surprised.

  8. #38
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    BTW, what means Unary Coding/Model/Guessing, etc. Do we talking about simple char guess i.e. single flag - predicted/mispredicted character?
    What hides behind such idea?

  9. #39
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I think you mean unary decomposition?

    You encode the current symbol's rank in a queue of probable symbols (in most cases mtf rank under higher orders) as a unary number. Every bit's probability can be derived from your model.
    In most cases you only need to encode a few bits. For example, have a look at the sr2 output (or bwt output). Round about 70% R0 hits, 20% R1 hits and 5% R2 hits. So you encode 95% (just a rought estimate) of the symbols in _at most_ 3 coding steps.

Page 2 of 2 FirstFirst 12

Similar Threads

  1. LZW, LZMW and LZAP comparison
    By encode in forum Data Compression
    Replies: 14
    Last Post: 3rd August 2017, 16:34
  2. Comparison of lossless PNG compression tools
    By Surfer in forum Data Compression
    Replies: 54
    Last Post: 19th September 2011, 23:58
  3. M1 - Optimized demo coder
    By toffer in forum Data Compression
    Replies: 189
    Last Post: 22nd July 2010, 00:49
  4. exe prefilter quick comparison
    By evg in forum Data Compression
    Replies: 7
    Last Post: 23rd May 2009, 17:20
  5. QUAD-SFX DEMO
    By encode in forum Forum Archive
    Replies: 17
    Last Post: 26th April 2007, 14:57

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •