View Poll Results: What do you think about my ppmx v0.04?

Voters
33. You may not vote on this poll
  • It's great! I love it! And it works really fine for me!

    9 27.27%
  • Very promising compressor! Will wait for future improvements...

    23 69.70%
  • Nothing special, I've expected something more!

    1 3.03%
Page 1 of 3 123 LastLast
Results 1 to 30 of 63

Thread: ppmx v0.04 is here!

  1. #1
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts

    Talking ppmx v0.04 is here!

    Well, I'm moving. New ppmx introduces many improvements, mostly it's about optimizations. Briefly, what's new:
    • Improved and more complete an order-12 PPM model
    • Reduced memory usage to ~280 MB
    • Many optimizations and code clean-up
    I believe that with this version I'm into something. At least new ppmx easily beats ppmz2 at SFC with no tuning. Note that current ppmx is still quite basic - I still have no luck with SEE and other tricks that may move my compressor even further. Secondly, I only started experimenting with PPM algorithm. In the past, I've mostly experimented with LZ-based stuff (lzpx/quad/lzpm/balz). So PPM is relatively new for me. Yep, pimple/pimple2 was cool, but it's too heavy for practical use. Also, quad has a small PPM built-in... I mean I was very close but not really into PPM. Anyway, with ppmx I tried to avoid any LZ-based features, making it a pure context coder. Don't know what's next, but hoping ppmx will be supreme. At least version by version it becomes more competitive. I even think that ppmx is my best compressor so far. I simply tested it against quad/balz - balz may win due to it's EXE-filter built-in - ppmx has no filters, but quad beaten even with its EXE-filter, and it's about binary/executable compression. On text files ppmx outperforms all LZ-stuff with a huge gap. ppmx's text compression is close to pimple being extremely faster. I think it's nice!
    Attached Files Attached Files

  2. #2
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    ppmx v0.04
    Quote Originally Posted by SFC
    A10.jpg -> 849,815 bytes
    acrord32.exe -> 1,641,632 bytes
    english.dic -> 856,492 bytes
    FlashMX.pdf -> 3,758,193 bytes
    fp.log -> 625,635 bytes
    mso97.dll -> 1,972,753 bytes
    ohs.doc -> 863,072 bytes
    rafale.bmp -> 825,730 bytes
    vcfiu.hlp -> 666,466 bytes
    world95.txt -> 485,705 bytes
    total -> 12,545,493 bytes
    calgary.tar -> 806,837 bytes
    canterbury.tar -> 543,762 bytes

    book1 -> 221,254 bytes
    wcc386.exe -> 292,768 bytes

    bookstar -> 9,731,249 bytes
    3200.txt -> 3,946,547 bytes
    bible.txt -> 762,327 bytes
    world00.txt -> 556,628 bytes

    mptrack.exe -> 518,476 bytes
    reaktor.exe -> 2,322,470 bytes

  3. #3
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by ENWIKs
    enwik8 -> 23,150,510 bytes
    enwik9 -> 201,384,355 bytes
    Take into account much smaller model size and different model order.

  4. #4
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Not bad, but still needs some work.
    My results on bookstar:
    Code:
    PPMX 0.01	10318337	33.422s
    PPMX 0.02	 9613763	57.594s
    PPMX 0.03	 9628510	51.343s
    PPMX 0.04	 9731249	~52s.
    // You always release the new version before I update my spreadsheets.

  5. #5
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Text compression is good. But, I'm sure it can be improved. But, I wonder why a10.jpg's score is very bad. It's known that byte-wise coding is not good for already compressed data. But, this is far from my expectation. I think, you have known redundancy. Maybe a trade-off for gaining speed or simplicity?
    BIT Archiver homepage: www.osmanturan.com

  6. #6
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by m^2 View Post
    You always release the new version before I update my spreadsheets.
    Yeah, if you can, please, include it in all your benchmarks.

  7. #7
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by osmanturan View Post
    Text compression is good. But, I'm sure it can be improved. But, I wonder why a10.jpg's score is very bad. It's known that byte-wise coding is not good for already compressed data. But, this is far from my expectation. I think, you have known redundancy. Maybe a trade-off for gaining speed or simplicity?
    I tuned counters for more non-stationary data like calgary.tar. With more stationary counters we will get a slightly higher text compression and much higher compression on files like a10.jpg, at the same time we will loose too much on binaries, and, overall compression. So, I tried to hit the global minimum...

  8. #8
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by encode View Post
    Yeah, if you can, please, include it in all your benchmarks.
    Yeah, I'm working on it. Multiple tests take a lot of time. Should have 1, maybe 2 tests updated tomorrow. PPMX 0.4 will be there.

    ADDED: I did some calculations. It's unlikely that even the 1st data update will finish.
    Last edited by m^2; 5th January 2009 at 21:24.

  9. #9
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by m^2 View Post
    Yeah, I'm working on it. Multiple tests take a lot of time. Should have 1, maybe 2 tests updated tomorrow. PPMX 0.4 will be there.
    Thanks a lot!

  10. #10
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Could you describe your model's structure?

    Do you use a secondary model for SSE or binary contexts and/or unary coding?

    Nice improvement and the speed is OK for that compression. But as i said may times - you'd better really "tune" parameters.

  11. #11
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by toffer View Post
    Could you describe your model's structure?

    Do you use a secondary model for SSE or binary contexts and/or unary coding?
    No SEE, SSE or unary coding. All in all it is a standard order-12-5-3-2-1-0 PPM. Although, I use a special heuristic for order-12 escape estimation. If we initialize an order-12 context and it is empty - i.e. it will contain only one symbol after initialization, then the escape count=1/4 and symbol count=3/4. In all other cases escape count=1/2 and symbol count=1/2 (escape method D). However, it is not like baseline PPMD, because I rescale the escape count during symbol rescaling. In addition, a special hashing mechanism, introduced in v0.04, efficiently drops older statistics - thus new ppmx has a higher binary compression.

  12. #12
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by encode View Post
    I tuned counters for more non-stationary data like calgary.tar. With more stationary counters we will get a slightly higher text compression and much higher compression on files like a10.jpg, at the same time we will loose too much on binaries, and, overall compression. So, I tried to hit the global minimum...
    I think, it's bad idea to tune a PPM coder for non-stationary data. Also, your choice for maximum coding order has not any correlation with your tuning. Because, non-stationary data usually can be modeled with shorter contexts. You can see this effect by observing M1 optimization logs for x86 data (contexts are shorter). Also, is it time to leave tuning your parameters with some ad-hoc approaches?
    BIT Archiver homepage: www.osmanturan.com

  13. #13
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by osmanturan View Post
    Also, your choice for maximum coding order has not any correlation with your tuning. Because, non-stationary data usually can be modeled with shorter contexts. You can see this effect by observing M1 optimization logs for x86 data (contexts are shorter).
    Well, in practice such PPM model set is the best. Usually, executables have long repetitive strings. If not, ppmx has a lower order models! Be sure if I did something, I had reason for that - this means I tried all variants and the best one was chosen!

    Quote Originally Posted by osmanturan View Post
    Also, is it time to leave tuning your parameters with some ad-hoc approaches?
    Having said I was impressed by M1, so for sure I'll run some automated optimization...

  14. #14
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by encode View Post
    Well, in practice such PPM model set is the best. Usually, executables have long repetitive strings. If not, ppmx has a lower order models! Be sure if I did something, I had reason for that - this means I tried all variants and the best one was chosen!
    You can't be sure until you did an automatic optimization. Still it does not guarantee the global minimum though

    Quote Originally Posted by encode View Post
    Having said I was impressed by M1, so for sure I'll run some automated optimization...
    Finally!...
    BIT Archiver homepage: www.osmanturan.com

  15. #15
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts

    Thumbs up

    Thanks Ilia!

  16. #16
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts

  17. #17
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts

    Cool

    Thanks for testing Matt!
    Anyway, new PPMX v0.05 is coming! And I did it! I mixed some ideas from PPM and CM and I get something crazy! The speed is just slightly slower, but compression... fp.log compressed down to 413,180 bytes at almost at the same speed! And current SFC result is 11,714,527 bytes, with no filters, and it is very draft version with at almost no tuning... This poll made some kick and I DID IT!



  18. #18
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    calgary.tar -> 725 KB
    book1 -> 207 KB


  19. #19
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,366
    Thanks
    212
    Thanked 1,018 Times in 540 Posts
    That's interesting, but what about decompression?

  20. #20
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by Shelwien View Post
    That's interesting, but what about decompression?
    "They decompress correctly."

  21. #21
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by Shelwien View Post
    That's interesting, but what about decompression?
    The very first version due to overflow in RC has no correct decompression... i.e. it not compresses fp.log to 400K, but correct result is below ~590K... Current version is OK. So, 725K on calgary.tar is true. The craziest gain is on binary/executable data. Text compression is nearly the same. The max order is 9. SFC result is about 11,900K. Well, I'm continue tuning/testing. Currently the main question is compression/speed/mem. tradeoff. In other words I will add/remove new models to keep it reasonable fast with high compression.
    Do you have any ideas for bounds?
    Say, it should use max ~500 MB, compression speed should be faster than paq9a, or CCM,CMM,...
    Actually, it will be slower than PPMd. In my opinion it should be much faster than any CM compressor, at the cost of some compression. Mem.use should fit in a machine with 1 GB RAM, so it should be less than 800 MB.
    Anyway, just testing my new PPMX more deeply, I've found it *VERY* efficient!

  22. #22
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by encode View Post
    The very first version due to overflow in RC has no correct decompression... i.e. it not compresses fp.log to 400K, but correct result is below ~590K... Current version is OK. So, 725K on calgary.tar is true. The craziest gain is on binary/executable data. Text compression is nearly the same. The max order is 9. SFC result is about 11,900K. Well, I'm continue tuning/testing. Currently the main question is compression/speed/mem. tradeoff. In other words I will add/remove new models to keep it reasonable fast with high compression.
    Do you have any ideas for bounds?
    Say, it should use max ~500 MB, compression speed should be faster than paq9a, or CCM,CMM,...
    Actually, it will be slower than PPMd. In my opinion it should be much faster than any CM compressor, at the cost of some compression. Mem.use should fit in a machine with 1 GB RAM, so it should be less than 800 MB.
    Anyway, just testing my new PPMX more deeply, I've found it *VERY* efficient!
    800 MB is a bad solution for a 1 GB machine. You can (de)compress. But you have to close all your programs and go, do something else during this.
    For benchmarks - no problem. For real usage a serious one. Therefore I suggest more like 600 MB.

  23. #23
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by encode View Post
    The very first version due to overflow in RC has no correct decompression... i.e. it not compresses fp.log to 400K, but correct result is below ~590K... Current version is OK. So, 725K on calgary.tar is true. The craziest gain is on binary/executable data. Text compression is nearly the same. The max order is 9. SFC result is about 11,900K. Well, I'm continue tuning/testing. Currently the main question is compression/speed/mem. tradeoff. In other words I will add/remove new models to keep it reasonable fast with high compression.
    Do you have any ideas for bounds?
    Say, it should use max ~500 MB, compression speed should be faster than paq9a, or CCM,CMM,...
    Actually, it will be slower than PPMd. In my opinion it should be much faster than any CM compressor, at the cost of some compression. Mem.use should fit in a machine with 1 GB RAM, so it should be less than 800 MB.
    Anyway, just testing my new PPMX more deeply, I've found it *VERY* efficient!
    Seems, you are mixing CM with PPM. This will surely make it slower. I hope, you are doing right thing. Anyway, m^2 is right. 800 MiB is an extreme requirement for the systems which have 1 GiB RAM. You might notice before, I had set LWCX mode's maximal memory to 650 MiB. Even that case, m^2 couldn't use it (LWCX automatically reduced the memory requirements because of insufficient free memory). So, even 500 MiB can be high at some cases. Also, do not forget Vista users. Vista is a typical memory sucker IMO. It's memory usage is up to 800 MiB Note that, even Windows Server 2003 x64 uses ~256 MiB memory. For compression strength, I expect to squeeze SFC into <11,XXX,XXX bytes from a compression power oriented compressor. Also, <1 MiB/second processing speed is not practical.
    BIT Archiver homepage: www.osmanturan.com

  24. #24
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,366
    Thanks
    212
    Thanked 1,018 Times in 540 Posts
    1. There's no reason for bytewise PPM to not be good at
    text compression... Traditional bytewise models are rather
    bad with binaries by design, so there's not much sense to
    not tune for text.
    It might change if you'd implement SSE or something similar
    to bitwise update with byte distributions.
    But its not the case, I guess.

    2, 500M is too much. It would become faster too, if you
    make it more compact. Its not like there're 100 submodels,
    and for something of ppmd level 64M should be enough.

    3. If its slower than ppmd, then you can compare it to
    ppmonstr .

  25. #25
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by osmanturan View Post
    Seems, you are mixing CM with PPM. This will surely make it slower. I hope, you are doing right thing. Anyway, m^2 is right. 800 MiB is an extreme requirement for the systems which have 1 GiB RAM. You might notice before, I had set LWCX mode's maximal memory to 650 MiB. Even that case, m^2 couldn't use it (LWCX automatically reduced the memory requirements because of insufficient free memory). So, even 500 MiB can be high at some cases. Also, do not forget Vista users. Vista is a typical memory sucker IMO. It's memory usage is up to 800 MiB Note that, even Windows Server 2003 x64 uses ~256 MiB memory. For compression strength, I expect to squeeze SFC into <11,XXX,XXX bytes from a compression power oriented compressor. Also, <1 MiB/second processing speed is not practical.
    Getting slightly over available mamory is a slight problem, there's usually a lot of things used only once that can be swapped out w/out any problem. I found that I can get commit charge to about 1100 MB and it's ok, so for me 600 MB is OK.
    IIRC BIT used like 680 MB really and it was too much.

    And Vista is unusably slow with 1 GB, so I doubt this configuration is really something to think of.

  26. #26
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by Shelwien View Post
    3. If its slower than ppmd, then you can compare it to
    ppmonstr .
    PPMd was beaten, at least on binaries! In addition, new PPMX should be much faster than PPMonstr. The basic PPMX core is just ~1.1X times slower than v0.03 on text files - i.e. the speed is at almost the same. Adding extra models may slightly affect speed. In some cases it might be more slower compared to PPMX v0.03 - since new version uses static mixing (full blending) and accesses to all contexts per byte. Note that the basic structure of a program is the same, including hash tables.

  27. #27
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Also, I think I'll add an LZP model - much like in PAQ9a. All versions of PPMX has bad performance on long repetitive data like pht.psd. The LZP model should solve this problem. Also with such model I may decrease the max order of the main model - making PPMX faster...

  28. #28
    Member
    Join Date
    Jun 2008
    Location
    L?vis, Canada
    Posts
    30
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I have an idea for a PPM framework coder.

    a) In each context, instead of storing the symbols frequencies, to do a move to front. This way, 0 would represent the most recently encoded symbol in the present context. 0..M-1 would represent symbols in the present context. M..N-1 would represent symbols in the next lower order context, and so on up to 255.

    b) If a lower order context has M or fewer symbols, that context is used instead.

    c) The transformed coefficients are funneled into an order-0 arithmetic coder with run length coding of zeros.

    d) When a memory limit is reached, the least recently used contexts are recycled. I never tried to implement this, so I'm not sure it would work.

    I am busy with libima, so I don't have the time to try it.

  29. #29
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,366
    Thanks
    212
    Thanked 1,018 Times in 540 Posts
    that's called "symbol ranking".

  30. #30
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Check out the non-public version of PPMX v0.04a (not outside this forum, should not be placed at any benchmark site). It's quite DRAFT version with the all new core, the model is heavier and slower, still has no match model. Anyway, it's just for fun!

    P.S.
    Maybe I should continue improving the "classic" PPM scheme as v0.04...

Page 1 of 3 123 LastLast

Similar Threads

  1. PPMX v0.05 - new PPM-based compressor
    By encode in forum Data Compression
    Replies: 49
    Last Post: 28th July 2010, 03:47
  2. ppmx v0.03 is here!
    By encode in forum Data Compression
    Replies: 13
    Last Post: 1st January 2009, 03:21
  3. PPMX v0.02 is here!
    By encode in forum Data Compression
    Replies: 26
    Last Post: 8th December 2008, 23:20
  4. PPMX - a new PPM encoder
    By encode in forum Data Compression
    Replies: 14
    Last Post: 30th November 2008, 17:03

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •