Page 1 of 2 12 LastLast
Results 1 to 30 of 50

Thread: PPMX v0.05 - new PPM-based compressor

  1. #1
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts

    PPMX v0.05 - new PPM-based compressor

    OK, new PPMX v0.05 is here! Too many improvements since v0.04, including newly invented SEE, heavy code and parameter optimizations. All in all, it's the first actual release of my PPMX!

    Enjoy!
    Attached Files Attached Files

  2. #2
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    SFC -> 12,314,165 bytes (with no filters)

    calgary.tar -> 775,647 bytes
    canterbury.tar -> 516,063 bytes


  3. #3
    Member
    Join Date
    Oct 2007
    Location
    Germany, Hamburg
    Posts
    408
    Thanks
    0
    Thanked 5 Times in 5 Posts
    Hehe, your compressed size for calgary.tar grow more and more in your previews in the 0.04 thread and is now with the release even higher..
    Is 0.05 now at least faster then 0.0.4 or was it because of problems in decompressing.

  4. #4
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Could you give a short briefing about modeling details?
    BIT Archiver homepage: www.osmanturan.com

  5. #5
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Quote Originally Posted by osmanturan View Post
    Could you give a short briefing about modeling details?
    Technically, PPMX v0.05=PPMX v0.04+SEE. i.e. same model set, etc. However, the model is larger, ALL parameters was optimized with my new automated optimizer. SEE is very important with PPM. SEE adjusts the escape count/probability based on some additional info - SEE context. For example, SEE context may contain various fields and flags such as - Do we have masked symbols? Model order, Quantized Total Count, etc.

  6. #6
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Is your context for SEE aligned? Say 16x, 32x, 64x etc? I'm asking this because it has a good effect on BIT.
    BIT Archiver homepage: www.osmanturan.com

  7. #7
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Quote Originally Posted by osmanturan View Post
    Is your context for SEE aligned? Say 16x, 32x, 64x etc? I'm asking this because it has a good effect on BIT.
    Yep!

  8. #8
    Member Vacon's Avatar
    Join Date
    May 2008
    Location
    Germany
    Posts
    523
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hello everyone,

    great news!
    IIRC you thought about opening sourcecode. Are you ready for this step...?

    Best regards!

  9. #9
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Quote Originally Posted by Vacon View Post
    IIRC you thought about opening sourcecode. Are you ready for this step...?
    Not yet. As with all of my programs I'll release it as an Open Source project at SourceForge.net only after I'll finish working on it. Furthermore, I do have some plans to do a PPMX compression library. Anyway, first of all I should beat PPMd...

  10. #10
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts

    Thumbs up

    Quote Originally Posted by encode View Post
    OK, new PPMX v0.05 is here! Too many improvements since v0.04, including newly invented SEE, heavy code and parameter optimizations. All in all, it's the first actual release of my PPMX!

    Enjoy!
    Thanks Ilia!

  11. #11
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Again I didn't manage to finish the next update before you released it.

    Will do quick tests tomorrow.

  12. #12
    Member
    Join Date
    May 2008
    Location
    Antwerp , country:Belgium , W.Europe
    Posts
    487
    Thanks
    1
    Thanked 3 Times in 3 Posts
    PPMX might as well become a replacement for the old PPMd
    I wonder how much you'l be able to improve ratio & speed further...
    What other improvements can we expect concerning PPMX ?

  13. #13
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Quote Originally Posted by pat357 View Post
    What other improvements can we expect concerning PPMX ?
    Taking into account that I just started... you can expect anything!

  14. #14
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by encode View Post
    Not yet. As with all of my programs I'll release it as an Open Source project at SourceForge.net only after I'll finish working on it. Furthermore, I do have some plans to do a PPMX compression library. Anyway, first of all I should beat PPMd...
    Do you intend to release LZSS sources too?

  15. #15
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Quote Originally Posted by m^2 View Post
    Do you intend to release LZSS sources too?
    If people will have interest. Also, I have a few ideas for further LZSS improvement, including a tiny ASM-driven decoder.

  16. #16
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by encode View Post
    If people will have interest. Also, I have a few ideas for further LZSS improvement, including a tiny ASM-driven decoder.
    I am interested.

  17. #17
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    It would be cool if optimize the NTFS compression. NTFS file system has the simplest LZSS encoder/decoder. Currently, the encoder is oriented for fastest decompression, keeping lots of "air" in the compressed stream. Making an optimized varsion will make a compressed files smaller at the cost of compression time. Anyway, the decompression speed will be the same or even faster...

  18. #18
    Member
    Join Date
    Jun 2008
    Location
    L?vis, Canada
    Posts
    30
    Thanks
    0
    Thanked 0 Times in 0 Posts
    If you have an algorithm for file-system compression, you will be able to submit it for ZFS.

    ZFS is the shit!

  19. #19
    Member
    Join Date
    May 2008
    Location
    brazil
    Posts
    163
    Thanks
    0
    Thanked 3 Times in 3 Posts
    Quote Originally Posted by encode View Post
    It would be cool if optimize the NTFS compression. NTFS file system has the simplest LZSS encoder/decoder. Currently, the encoder is oriented for fastest decompression, keeping lots of "air" in the compressed stream. Making an optimized varsion will make a compressed files smaller at the cost of compression time. Anyway, the decompression speed will be the same or even faster...
    You can post some ideas at ntfs-3g - a free/opensource software implementation of ntfs.

    http://www.ntfs-3g.org/

  20. #20
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by lunaris View Post
    You can post some ideas at ntfs-3g - a free/opensource software implementation of ntfs.

    http://www.ntfs-3g.org/
    MS would have to support it. With different compression it wouldn't be NTFS anymore.

  21. #21
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    PPMX's homepage:
    http://encode.su/ppmx/


  22. #22
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    For a last few months I'm working heavy for a new PPMX 0.06. Many times new PPMX was rewritten from scratch. I tried many techniques from simple trees to a hashed linked lists. Experimented with SSE2 (Streaming SIMD Extensions 2). Got an extreme speedup with a low order models. All in all, writing a good PPM that may compete with the PPMd is rather complex task. Additionally I explored a new SEE technique, far more superior to that previously used by me. So, new PPMX will be oriented on speed and efficiency. It's already four times faster than a previous release...

  23. #23
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Well, currently, an order-5 PPMX compresses book1 to 214439 bytes. Adding more aggressive model update we may loose some compression on text files, but will have a serious compression gain on binary files. Anyway, I do plan to achieve at least 215xxx bytes on book1 and at the same time be cool with binaries...

  24. #24
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    Finally tested ppmx 0.05. http://mattmahoney.net/dc/text.html#1936

  25. #25
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Thanks a lot!

  26. #26
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Continue working on PPMX. This time I make PPMX small and fast and it MUST be released since current version (0.05) is such unoptimized compared to what I've got now. PPMX 0.05 has many redundant computations, lots of inefficient and dummy code... New version has an extremely simple and flexible code, it's not overloaded with extra stuff, but it uses some tricks to gain a little bit compression, it's a (relatively) low order PPM (an order-4) and this fact makes it quite specific - on some files like english.dic and rafale.bmp it's really efficient on others it's not that efficient. As an option I may add a small LZP preprocessor. Anyway, the goal is a new PPM-based compressor that have different properties than PPMd. New PPMX uses an optimized hashing, so it's memory usage is fixed and no need to flush or rebuild any tree like in PPMd...

  27. #27
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,423
    Thanks
    223
    Thanked 1,052 Times in 565 Posts
    Its already good if there's some progress
    But I'd suggest to avoid tuning stuff to specific formats like wordlist or uncompressed images -
    its too easy to make better specific models for these.
    Also I still think that the tree is the main feature of PPM. With hashtables it would be probably
    better to do CM over o1 huffman code or something.

  28. #28
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Quote Originally Posted by Shelwien View Post
    Its already good if there's some progress
    I spent far much more time on PPMX than on BCM... PPMX is MUCH more complex. And it's much interesting to work on real context coder that have no BWT laces.

    Quote Originally Posted by Shelwien View Post
    But I'd suggest to avoid tuning stuff to specific formats like wordlist or uncompressed images -
    its too easy to make better specific models for these.
    It's not really about the tuning. PPM encodes a symbol via just one context, sometimes a higher order contexts (usually order-5 and above) may provide completely wrong predictions. A good example is a simple high-order PPM's performance on already pointed rafale.bmp and english.dic. To avoid that, we should add some stuff like II, that not really helps, much more computationally expensive LOE and/or SSE, making PPM too heavy and these days better write CM. Today PPM should be fast, memory efficient and simple.

  29. #29
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,423
    Thanks
    223
    Thanked 1,052 Times in 565 Posts
    Yeah, so as I said... there's no sense to care about compression of raw text (not preprocessed), wavs, bmps, or exes -
    and when you consider writing a fast codec for any of these, there's no need for context switching (PPM) or mixing (CM) -
    just plain structured symbol coding gives good enough compression and is fastest by definition.

  30. #30
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Quote Originally Posted by Shelwien View Post
    ...and when you consider writing a fast codec for any of these, there's no need for context switching (PPM) or mixing (CM) -
    just plain structured symbol coding gives good enough compression and is fastest by definition.
    What do you mean, structured symbol coding?

Page 1 of 2 12 LastLast

Similar Threads

  1. BCM v0.09 - The ultimate BWT-based file compressor!
    By encode in forum Data Compression
    Replies: 22
    Last Post: 6th March 2016, 10:26
  2. BCM v0.08 - The ultimate BWT-based file compressor!
    By encode in forum Data Compression
    Replies: 78
    Last Post: 12th August 2009, 11:14
  3. BCM v0.01 - New BWT+CM-based compressor
    By encode in forum Data Compression
    Replies: 81
    Last Post: 9th February 2009, 16:47
  4. PPMX - a new PPM encoder
    By encode in forum Data Compression
    Replies: 14
    Last Post: 30th November 2008, 17:03
  5. TURTLE incoming... Fast PPM file compressor.
    By Nania Francesco in forum Forum Archive
    Replies: 104
    Last Post: 8th August 2007, 21:40

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •