OK, new PPMX v0.05 is here!Too many improvements since v0.04, including newly invented SEE, heavy code and parameter optimizations. All in all, it's the first actual release of my PPMX!
Enjoy!![]()
OK, new PPMX v0.05 is here!Too many improvements since v0.04, including newly invented SEE, heavy code and parameter optimizations. All in all, it's the first actual release of my PPMX!
Enjoy!![]()
SFC -> 12,314,165 bytes (with no filters)
calgary.tar -> 775,647 bytes
canterbury.tar -> 516,063 bytes
![]()
Hehe, your compressed size for calgary.tar grow more and more in your previews in the 0.04 thread and is now with the release even higher..
Is 0.05 now at least faster then 0.0.4 or was it because of problems in decompressing.
Could you give a short briefing about modeling details?
BIT Archiver homepage: www.osmanturan.com
Technically, PPMX v0.05=PPMX v0.04+SEE. i.e. same model set, etc. However, the model is larger, ALL parameters was optimized with my new automated optimizer. SEE is very important with PPM. SEE adjusts the escape count/probability based on some additional info - SEE context. For example, SEE context may contain various fields and flags such as - Do we have masked symbols? Model order, Quantized Total Count, etc.![]()
Is your context for SEE aligned? Say 16x, 32x, 64x etc? I'm asking this because it has a good effect on BIT.
BIT Archiver homepage: www.osmanturan.com
Hello everyone,
great news!
IIRC you thought about opening sourcecode. Are you ready for this step...?
Best regards!
Again I didn't manage to finish the next update before you released it.
Will do quick tests tomorrow.![]()
PPMX might as well become a replacement for the old PPMd
I wonder how much you'l be able to improve ratio & speed further...
What other improvements can we expect concerning PPMX ?
It would be cool if optimize the NTFS compression. NTFS file system has the simplest LZSS encoder/decoder. Currently, the encoder is oriented for fastest decompression, keeping lots of "air" in the compressed stream. Making an optimized varsion will make a compressed files smaller at the cost of compression time. Anyway, the decompression speed will be the same or even faster...![]()
If you have an algorithm for file-system compression, you will be able to submit it for ZFS.
ZFS is the shit!![]()
You can post some ideas at ntfs-3g - a free/opensource software implementation of ntfs.
http://www.ntfs-3g.org/
PPMX's homepage:
http://encode.su/ppmx/
![]()
For a last few months I'm working heavy for a new PPMX 0.06. Many times new PPMX was rewritten from scratch. I tried many techniques from simple trees to a hashed linked lists. Experimented with SSE2 (Streaming SIMD Extensions 2). Got an extreme speedup with a low order models. All in all, writing a good PPM that may compete with the PPMd is rather complex task. Additionally I explored a new SEE technique, far more superior to that previously used by me. So, new PPMX will be oriented on speed and efficiency. It's already four times faster than a previous release...
![]()
Well, currently, an order-5 PPMX compresses book1 to 214439 bytes. Adding more aggressive model update we may loose some compression on text files, but will have a serious compression gain on binary files. Anyway, I do plan to achieve at least 215xxx bytes on book1 and at the same time be cool with binaries...
![]()
Finally tested ppmx 0.05. http://mattmahoney.net/dc/text.html#1936
Thanks a lot!![]()
Continue working on PPMX. This time I make PPMX small and fast and it MUST be released since current version (0.05) is such unoptimized compared to what I've got now. PPMX 0.05 has many redundant computations, lots of inefficient and dummy code... New version has an extremely simple and flexible code, it's not overloaded with extra stuff, but it uses some tricks to gain a little bit compression, it's a (relatively) low order PPM (an order-4) and this fact makes it quite specific - on some files like english.dic and rafale.bmp it's really efficient on others it's not that efficient. As an option I may add a small LZP preprocessor. Anyway, the goal is a new PPM-based compressor that have different properties than PPMd. New PPMX uses an optimized hashing, so it's memory usage is fixed and no need to flush or rebuild any tree like in PPMd...
Its already good if there's some progress
But I'd suggest to avoid tuning stuff to specific formats like wordlist or uncompressed images -
its too easy to make better specific models for these.
Also I still think that the tree is the main feature of PPM. With hashtables it would be probably
better to do CM over o1 huffman code or something.
I spent far much more time on PPMX than on BCM... PPMX is MUCH more complex. And it's much interesting to work on real context coder that have no BWT laces.
It's not really about the tuning. PPM encodes a symbol via just one context, sometimes a higher order contexts (usually order-5 and above) may provide completely wrong predictions. A good example is a simple high-order PPM's performance on already pointed rafale.bmp and english.dic. To avoid that, we should add some stuff like II, that not really helps, much more computationally expensive LOE and/or SSE, making PPM too heavy and these days better write CM. Today PPM should be fast, memory efficient and simple.
Yeah, so as I said... there's no sense to care about compression of raw text (not preprocessed), wavs, bmps, or exes -
and when you consider writing a fast codec for any of these, there's no need for context switching (PPM) or mixing (CM) -
just plain structured symbol coding gives good enough compression and is fastest by definition.