Page 1 of 2 12 LastLast
Results 1 to 30 of 39

Thread: Comparison of the recent CM demo coders

  1. #1
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,366
    Thanks
    212
    Thanked 1,018 Times in 540 Posts

    Comparison of the recent CM demo coders

    E2160 @ 9x360=3.24, DDR2-800 5-5-5-18 @ 900
    http://shelwien.googlepages.com/cm200805.htm

    Also here're a windows port of m01 and "fpaq0f2x" source.
    Attached Files Attached Files
    Last edited by Shelwien; 30th May 2008 at 03:43.

  2. #2
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Thanks for testing.

  3. #3
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,366
    Thanks
    212
    Thanked 1,018 Times in 540 Posts
    Now, here's something exclusive: http://shelwien.googlepages.com/cm200505-iphone.htm

    Quite the different metric ranking...
    Well, no wonder since the damn thing has only 16k read cache,
    no write cache, and really slow multiplications (via FPU, even integer).

  4. #4
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Interesting test! Thanks Shelwien!

  5. #5
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,366
    Thanks
    212
    Thanked 1,018 Times in 540 Posts
    http://shelwien.googlepages.com/cm200806.htm
    fcm2a added (when'd you learn to properly compile things yourself already?
    o2_xx SSE2 coders retested and added (from sse.htm)
    fcm2sh4 added
    - you have too much different scales there. it slows things down.
    - managed to get better results than fcm2, but guess that my mixer
    really doesn't like the nonlinear counters. Also note that its significantly
    better on most of files but loses too much on fp.log (same goes for
    m01 vs o2_xx btw). Guess I'd have to do something special about
    determinated contexts.

  6. #6
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by Shelwien View Post
    http://shelwien.googlepages.com/cm200806.htm
    fcm2a added (when'd you learn to properly compile things yourself already?
    o2_xx SSE2 coders retested and added (from sse.htm)
    fcm2sh4 added
    - you have too much different scales there. it slows things down.
    - managed to get better results than fcm2, but guess that my mixer
    really doesn't like the nonlinear counters. Also note that its significantly
    better on most of files but loses too much on fp.log (same goes for
    m01 vs o2_xx btw). Guess I'd have to do something special about
    determinated contexts.
    OK, thanks! Actually, indeed better use different update rates for lower and higher orders. Might be this gives the most compression difference. Also it was specially tuned... Mine wasn't. Still not really beleive in your shamanic weight update. Interesting how it will work with an adidtional context - as FCM2a does.
    Translating to human language your update:
    If prediction difference between p1 and p2 achieves some threshold (below or above) correct the weight:
    __min/__max just set bounds of the weight
    OK, division by SCALE will be replaced by shift by the compiler, but division by (p2-p1) will kept as is.

    ((SCALE-wr)*w+SCALE-1)/SCALE + wr*(P-p1)/(p2-p1) )

    At the first sight it's computationally expensive. However, according to your timings it works pretty fast. Will test such thing... But at the moment I like mine...

  7. #7
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts

    Thumbs up

    Your "fcm2sh4_xi_ipo_prof-3" is quick. That would be quite a challenge for my optimised compiles.

  8. #8
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    ((SCALE-wr)*w+SCALE-1)/SCALE + wr*(P-p1)/(p2-p1) )
    Thats an exponentially smoothed (1-wr, wr) version between the old weight and the solution of a counter back-propagation, damn, i've rewritten it 3 times
    Last edited by toffer; 2nd June 2008 at 21:49.

  9. #9
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,366
    Thanks
    212
    Thanked 1,018 Times in 540 Posts
    > indeed better use different update rates for lower and higher orders.

    Also better not use the power-of-2 speeds and noisy += updates.

    > Might be this gives the most compression difference.

    I don't think so as a counter with shift 2 is not really a counter.
    Probably replacing your counter pair with a single counter of mine would be benificial
    for this coder.

    > Also it was specially tuned...

    Now, after I really tuned it right (with the sum of separately compressed files),
    it shows 24171068.

    > Mine wasn't.

    Algorithm selection is tuning too

    > Still not really beleive in your shamanic weight update.

    This one is not shamanic actually.
    It just tries to update the mixed probability the same way
    as if it was a counter (ie remixing after mixer update would
    result in updated probability value).
    So if you won't update the original counters, then this mixer's output
    would behave exactly as if it was a single counter.

    > Interesting how it will work with an adidtional context - as FCM2a does.

    I would probably always be able to beat your results after tuning various things,
    of course only if the coder is simple enough.

    > If prediction difference between p1 and p2 achieves some threshold
    > (below or above) correct the weight: __min/__max just set bounds of the weight

    Both |p1-p2| check and clipping are only necessary because of imprecise integer calculation there.

    > OK, division by SCALE will be replaced by shift by the compiler,
    > but division by (p2-p1) will kept as is.

    It actually can be replaced with table lookup and multiplication.
    And also its skipped most of the time because of similar probability values.

    > But at the moment I like mine...

    Well, as I said, I did use something like that too.
    But this mixer is universal, while "yours" won't be good in different circumstances.
    Last edited by Shelwien; 2nd June 2008 at 22:15.

  10. #10
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by Shelwien View Post
    But this mixer is universal, while "yours" won't be good in different circumstances.
    Exactly! Mixing just two probabilities with known nature is what we have here. Mostly we know that order-1/2 is more precise than just order-0. It's not true only at the beginning of encoding - while order-1/2 not gather needed amount of statistics, or in some extreme cases. Also, it's mainly true for such simple encoders like FCM2, and especially on text files. In different situations like LZ-output coding, the things are slightly different. However, for simple tasks we should use simple things...

  11. #11
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,366
    Thanks
    212
    Thanked 1,018 Times in 540 Posts
    http://shelwien.googlepages.com/cm200806.htm updated.

    - m012 v2 added (as m012a)
    - fcm2sh5 added

    @toffer: Great. 2M less and you'll reach ppmonstr's results with o2
    (well, his is unfair o2 as it uses extra contexts in SSE and
    also there's a sparse o1 submodel

    @encode: These are new constants for the fcm2sh4->sh5:
    static const int FCM2sh4_s1r1 = 1;
    static const int FCM2sh4_s1r2 = 3;
    static const int FCM2sh4_s2r1 = 2;
    static const int FCM2sh4_s2r2 = 5;
    static const int FCM2sh4_mlim = 6127;
    static const int FCM2sh4_mwr = 116;
    static const int FCM2sh4_mmw = 0;

    @LP: Looks like a hit this time (though I tested only one build).

  12. #12
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by Shelwien View Post
    @encode: These are new constants for the fcm2sh4->sh5:
    static const int FCM2sh4_s1r1 = 1;
    static const int FCM2sh4_s1r2 = 3;
    static const int FCM2sh4_s2r1 = 2;
    static const int FCM2sh4_s2r2 = 5;
    static const int FCM2sh4_mlim = 6127;
    static const int FCM2sh4_mwr = 116;
    static const int FCM2sh4_mmw = 0;

  13. #13
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts

    Thumbs up

    Thanks Shelwien!

  14. #14
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,366
    Thanks
    212
    Thanked 1,018 Times in 540 Posts

  15. #15
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,366
    Thanks
    212
    Thanked 1,018 Times in 540 Posts

  16. #16
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by Shelwien View Post
    Interesting...

  17. #17
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,366
    Thanks
    212
    Thanked 1,018 Times in 540 Posts
    Btw, making this, I remembered about distance coding.
    So, here's a question: did you try LZ with _forward_ references?
    Like "copy the next string of specified length to specified offset"?
    Because distance coding is known to work better than plain MTF,
    and MTF is a LZ without match lengths.

  18. #18
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by Shelwien View Post
    So, here's a question: did you try LZ with _forward_ references?
    Nope... I'm hoping that ROLZ should be supreme anyway!

  19. #19
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,366
    Thanks
    212
    Thanked 1,018 Times in 540 Posts
    ROLZ or not doesn't matter.
    If you encode backward references, then you can encode forward references as well.

  20. #20
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by Shelwien View Post
    ROLZ or not doesn't matter.
    If you encode backward references, then you can encode forward references as well.
    I might be not really understood the DC idea - since I'm not with BWT...
    Is it similar to reverse the entire input?

  21. #21
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,366
    Thanks
    212
    Thanked 1,018 Times in 540 Posts
    No.
    With MtF you store the number of different symbols since last occurence of given symbol.
    And with DC you store an offset to the next occurence of given symbol.
    Well, usually the DC offset is also masked by not counting the already referenced cells...
    which is the same as subtracting the MtF rank of symbol at the target location.

  22. #22
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,366
    Thanks
    212
    Thanked 1,018 Times in 540 Posts

    Version-a-day series

    http://shelwien.googlepages.com/st2rc_v2.rar
    http://shelwien.googlepages.com/st2rc_v2.htm

    - compression improved by switching to order1 model of MTF output
    - tested the effect of disabling the fpaq0pv4B "macrofication" preprocessor
    - tested the rc_mm (original rangecoder from fpaq0p) vs rc_sh1m

  23. #23
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,366
    Thanks
    212
    Thanked 1,018 Times in 540 Posts

    ST2rc FTW

    Now v3 has better compression than toffer's m012a with 3x faster speed.

    http://shelwien.googlepages.com/st2rc_v3.rar
    http://shelwien.googlepages.com/st2rc_v3.htm

    - adaptive mixing of order0 & order1 added
    - MTF disabled as it hurts compression %-)
    - E8/E9 filter added
    - temporary switch to rc_mm as rc_sh2d produced 200k worse results
    (due to simplified renormalization)

  24. #24
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    And how about ST4?

  25. #25
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,366
    Thanks
    212
    Thanked 1,018 Times in 540 Posts
    Goals of this experiment are:
    1. Investigation of context histories' properties
    (And showing to toffer that order0 modelling is not enough for context histories)
    2. rc_mm vs rc_sh comparison
    3. counters comparison (Shift-counters vs Node2i for now, may include
    delayed counters later)
    4. binary vs unary encoding
    5. mixing and SSE

    So I'm not sure about ST4. There's not enough address space to do it straight,
    and I don't have any particularly interesting ideas about its implementation.
    Although maybe we can use Matt's bbb instead.

    Next plans:
    1. Investingating the reason of rc_sh redundancy (long sequencies of improbable
    values are clearly wrong)
    2. Unary/binary symbol coding
    3. SSE

  26. #26
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    For my next CM experiments i plan to do unary coding - you convinced me of the speed and compression gain (and as i said, i did a static variable length code decomposition, which was notably faster than straight flat decomposition).
    One clearly needs some PPM data structures. Could you tell me any references?

  27. #27
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien View Post
    - E8/E9 filter added
    Well, this is unfair to compare. And i'm m01 does only store the weights.

  28. #28
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,366
    Thanks
    212
    Thanked 1,018 Times in 540 Posts
    Well, I only said that to attract attention. But its true too
    And the result without E8 is included for reference. Its worse than yours, but not that far.
    Also of course E8 is unfair, but ST2 is unfair too - the coder has to
    process the same data type changes 64k times.
    Btw that's why I'm not particularly interested in ST4 or BWT for this.
    Last edited by Shelwien; 6th June 2008 at 14:57.

  29. #29
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,366
    Thanks
    212
    Thanked 1,018 Times in 540 Posts
    > For my next CM experiments i plan to do unary coding

    That's great.
    There're also things to check out in that area.
    Like, all the known implementations use MtF ranking,
    but it should be best to design some fast model that
    would assign rank0 to a really most probable symbol.

    > One clearly needs some PPM data structures.

    There's not much choice actually.
    Well, you can keep the pointers to context nodes in a hash,
    but there's no reason as it would be faster with a tree.
    And same way, you can link the symbol nodes into a list,
    but its obviously faster to use arrays.

    Although you can choose if you really want a suffix tree
    (which contains a link to "bc" node in "abc" node) or not,
    but it only has sense if you're going to make a PPM, which
    would mostly encode all the symbols in highest order.

    Also I recently think that it should be viewed as a caching system,
    because any necessary statistics can be dynamically calculated,
    given the data, and then you can cache the frequently accessed
    statistics.

    > Could you tell me any references?

    http://compression.ru/download/artic..._ppmii_pdf.rar
    http://en.wikipedia.org/wiki/Suffix_tree
    http://www.compression.ru/ds/ppmdj1.rar (SubAlloc.hpp)
    http://www.compression.ru/ds/ppmsj.rar
    http://compression.ru/sh/ppmy_3c_sse.rar (ctx_tree.inc)
    http://compression.ru/sh/ash04.rar (ash_tree.inc)

  30. #30
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Thanks!

    How may orders do you mix at once for a unary guess? Something around three should be ok? I'll try anyway, but i need a point to start.

    By accident i found a way to estimate R0 symbols with a hit rate of 97-99% (!!!) on all files expect acrord32.exe (here it was 92%). I guess i'll use this heuristic . This technique could be applied to around 20% of the data.

    Something about your ST: do you encode the symbol streams *per order 2 context* (after ST) using order 10 mixing? This clearly is something > order 2.
    Last edited by toffer; 6th June 2008 at 18:37.

Page 1 of 2 12 LastLast

Similar Threads

  1. LZW, LZMW and LZAP comparison
    By encode in forum Data Compression
    Replies: 14
    Last Post: 3rd August 2017, 16:34
  2. Comparison of lossless PNG compression tools
    By Surfer in forum Data Compression
    Replies: 54
    Last Post: 19th September 2011, 23:58
  3. M1 - Optimized demo coder
    By toffer in forum Data Compression
    Replies: 189
    Last Post: 22nd July 2010, 00:49
  4. exe prefilter quick comparison
    By evg in forum Data Compression
    Replies: 7
    Last Post: 23rd May 2009, 17:20
  5. QUAD-SFX DEMO
    By encode in forum Forum Archive
    Replies: 17
    Last Post: 26th April 2007, 14:57

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •