Results 1 to 17 of 17

Thread: PPMX 0.08 is here!

  1. #1
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,965
    Thanks
    367
    Thanked 343 Times in 134 Posts

    Cool PPMX 0.08 is here!

    Hello everyone!

    As a New Year gift, please welcome a new version of PPMX!

    What's new:
    • Max model order is 6. PPMX uses order skip, so actual model set is an order-6-4-2-1-0-(-1)
    • Heavily improved escape handling. Instead of a straight SEE, PPMX make use of "escape adjustment" tricks, based on previous escape history. My experiments shows that these tricks as efficient, in terms of compression, as SEE, being less computationally expensive, thus much faster.


    New PPMX can be downloaded at my homepage:
    compressme.net


    Quick testing results:

    ENWIK9: 1000000000 -> 202868559 in 107.5 sec

    ENWIK8: 100000000 -> 23204040 in 11.9 sec

    bookstar: 35594240 -> 9641612 in 6.1 sec

    osho.txt: 206908949 -> 36675241 in 14.5 sec

    3200.txt: 16013962 -> 3891976 in 1.7 sec

    world95.txt: 2988578 -> 514531 in 0.2 sec

    calgary.tar: 3152896 -> 791883 in 0.4 sec


    * Compression time is as shown by PPMX, thus it includes I/O time.
    * Tested on Core i7-2600K @ 4.6GHz, 8GB DDR3 @ 1866MHz, 240GB Corsair Force GT SSD

  2. #2
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    I tested it a year early (it is still 2011 here for a few hours). http://mattmahoney.net/dc/text.html#1936

  3. #3
    Member Vacon's Avatar
    Join Date
    May 2008
    Location
    Germany
    Posts
    523
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hello everyone,

    thank you
    and Happy new year!

    Best regards!

  4. #4
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,965
    Thanks
    367
    Thanked 343 Times in 134 Posts
    Quote Originally Posted by Matt Mahoney View Post
    I tested it a year early (it is still 2011 here for a few hours). http://mattmahoney.net/dc/text.html#1936
    Thanks a lot!

    ENWIK9 decompression: 202868559 -> 1000000000 in 127.2 sec

    Memory usage on my machine is the same as you posted - 355,628 K

    (and SEE improvements, not SSE )

  5. #5
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,219
    Thanks
    188
    Thanked 962 Times in 496 Posts

  6. #6
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    What's up with finn lst?

  7. #7
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,965
    Thanks
    367
    Thanked 343 Times in 134 Posts
    PPMX is PPM, others are CM. If compare apple to oranges - PPMX uses only ONE highest context possible to encode a symbol. Of course order-6 is a bad idea to code a dictionary. Same thing happens with "english.dic" as example.

  8. #8
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts

  9. #9
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,965
    Thanks
    367
    Thanked 343 Times in 134 Posts
    PPMX v0.09 - an upcoming version - the fastest PPMX to date:
    Code:
    C:\ppmx\Release>ppmx c enwik8 enwik8.z
    Compressing enwik8: 100000000->25929359 in 7.082s
    
    C:\ppmx\Release>ppmx c enwik9 enwik9.z
    Compressing enwik9: 1000000000->232458481 in 66.44s
    It's an order-4 PPM with no preprocessing. As an option, I'm thinking about a simple byte-aligned LZP-preprocessor - to speed-up PPM and handle high-order contexts...


  10. #10
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,965
    Thanks
    367
    Thanked 343 Times in 134 Posts
    Code:
    C:\ppmx\Release>ppmx c enwik9 enwik9.z
    Compressing enwik9: 1000000000->232458481 in 59.452s

  11. #11
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    When will we see a release?

  12. #12
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,965
    Thanks
    367
    Thanked 343 Times in 134 Posts
    Can't say right now; the work in progress - I'm just keep going and going. It's a huge amount of work. Just prefer to release one huge update instead of 200+ versions.
    The goal is to make PPMd alternative - not clone. PPMX is based on hash tables and has different properties. New PPMX is a complete rewrite - I've double-checked and optimized each line of a code - making PPMX notable faster than previous releases.

    And new PPMX is PPMd competitor. If compare O4 PPMX to O4 PPMd:
    • On text files it's still slower than PPMd, but not that much
    • PPMX is superfast on highly compressible data (like fp.log), PPMd is superfast too though
    • On binary data PPMX is fast enough, in many cases faster than PPMd
    • On random or analog data PPMX is many times faster than PPMd. Incompressible or, say, audio files is a nightmare for PPMd

    As a note, O4 PPMX is faster than O5 PPMd all the time...

    So, right now I'm working on fundamental data structures and optimizations.
    BTW, the PPM is huge in terms of data analysis...

  13. #13
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,965
    Thanks
    367
    Thanked 343 Times in 134 Posts
    New results, baseline version - simple SEE, no recency scaling:
    Code:
    PPMX v0.09 (MAX_ORDER=4 STEP=10)
    
    C:\ppmx\Release>ppmx c enwik9 enwik9.z
    Compressing enwik9: 1000000000->232473441 in 56.456s
    
    C:\ppmx\Release>ppmx c enwik8 enwik8.z
    Compressing enwik8: 100000000->25926756 in 6.24s
    
    C:\ppmx\Release>ppmx c fp.log fp.z
    Compressing fp.log: 20617071->783708 in 0.359s
    
    C:\ppmx\Release>ppmx c calgary.tar calgary.z
    Compressing calgary.tar: 3152896->818685 in 0.25s
    
    C:\ppmx\Release>ppmx c TraktorDJStudio3.exe TraktorDJStudio3.z
    Compressing TraktorDJStudio3.exe: 29124024->6184823 in 2.543s
    
    PPMX v0.09 (MAX_ORDER=4 STEP=12)
    
    C:\ppmx\Release>ppmx c enwik9 enwik9.z
    Compressing enwik9: 1000000000->232717599 in 57.876s
    
    C:\ppmx\Release>ppmx c enwik8 enwik8.z
    Compressing enwik8: 100000000->25964901 in 6.334s
    
    C:\ppmx\Release>ppmx c fp.log fp.z
    Compressing fp.log: 20617071->764584 in 0.359s
    
    C:\ppmx\Release>ppmx c calgary.tar calgary.z
    Compressing calgary.tar: 3152896->816945 in 0.25s
    
    C:\ppmx\Release>ppmx c TraktorDJStudio3.exe TraktorDJStudio3.z
    Compressing TraktorDJStudio3.exe: 29124024->6151110 in 2.59s

  14. #14
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,965
    Thanks
    367
    Thanked 343 Times in 134 Posts
    PPMX v0.09 is here!

    And yet, it's kind of a DRAFT. I've incorporated some important improvements and optimizations, but kept it as a quite baseline PPM - avoiding complex SEE and other tricks (like delayed counters, SSE for rank0 symbol) which improve compression seriously, but make PPMX somewhat slow...

    LTCB results:
    Code:
    C:\ppmx\Release>ppmx c enwik9 enwik9.z
    Compressing enwik9: 1000000000->232581333 in 57.21s
    
    C:\ppmx\Release>ppmx d enwik9.z e9
    Decompressing enwik9.z: 232581333->1000000000 in 68.77s
    
    C:\ppmx\Release>ppmx c enwik8 enwik8.z
    Compressing enwik8: 100000000->25952954 in 6.31s
    
    C:\ppmx\Release>ppmx d enwik8.z e8
    Decompressing enwik8.z: 25952954->100000000 in 7.52s

  15. #15
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts

  16. The Following User Says Thank You to Matt Mahoney For This Useful Post:

    encode (25th March 2014)

  17. #16
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,965
    Thanks
    367
    Thanked 343 Times in 134 Posts
    Thanks Matt!

  18. #17
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,965
    Thanks
    367
    Thanked 343 Times in 134 Posts
    Some testing results with ENWIK9:
    Order-9-5-3-2-1-(-1) -> 183,xxx,xxx bytes
    world95.txt -> 484,xxx bytes

    Max order-11 result is 179,xxx,xxx bytes

    Mem use is about 1.3 GB, speed is notable faster than the "PPMd -r1" (same max model order), but slower than "PPMd -r0"

    I use hash tables a la PAQ1 - at each step I'm adding a new context, after some linear search removing an older one with the smallest total_count - i.e. the PPMX model have no global reset - it is is balanced at any given step.

    No SEE,SSE,LOE, whatsoever - plain-Jane PPM. Actually a proper SEE makes little to no difference with ENWIK9. It really helps on small and binary files though...

    First of all, it's sadly that the ENWIK9.BWT can be compressed to about 170,xxx,xxx to 180,xxx,xxx bytes with a simplest order-1 encoder - i.e. - better than simple PPM (Having said, BWT stage eats far much more memory)

    Adding additional tricks&flicks will make a PPM-encoder slower than allowed - better use CM instead.

    Anyway, trying to do something here - keeping myself away from book1 addiction - the sign of PPM compressors era...


Similar Threads

  1. PPMX 0.07 is here!
    By encode in forum Data Compression
    Replies: 5
    Last Post: 24th February 2011, 22:38
  2. PPMX 0.06 has been released!
    By encode in forum Data Compression
    Replies: 27
    Last Post: 21st February 2011, 01:21
  3. ppmx v0.04 is here!
    By encode in forum Data Compression
    Replies: 62
    Last Post: 17th January 2009, 13:57
  4. ppmx v0.03 is here!
    By encode in forum Data Compression
    Replies: 13
    Last Post: 1st January 2009, 02:21
  5. PPMX v0.02 is here!
    By encode in forum Data Compression
    Replies: 26
    Last Post: 8th December 2008, 22:20

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •