I want to choose my best compressor and make it my main and the flagship project. Which one of my compressors works the best for you? Poll and/or post your opinion!
![]()
An improved BALZ v1.13 (ROLZ compression)
An improved BCM v0.02 (BWT compression)
The brand new CM-based compressor
The brand new LZ77-based compressor
Other, not listed
I want to choose my best compressor and make it my main and the flagship project. Which one of my compressors works the best for you? Poll and/or post your opinion!
![]()
PIMPLE v1.43 is still my favorite.![]()
Here is my comments about your well-known compressors:
QUAD
I really like it. It is fast and efficient. You may focus on optimal parsing for future improvement. I think, without optimal parsing, we cannot make a huge step.
BCM
I know, it's your first BWT experience. So, everything can be expected. For me, it's a bit slow. I think, this is about totally BWT worst-case on certain data.
TC
It seems efficient. But, not very fast. Works as a PAQ clone. Do not misunderstood me. I don't mean that you cheat some codes from PAQ source. Just works deadly slow like PAQ.
PIMPLE and PIM
They are very practical compressor. I like them.
LZPM
It's a strong archiver from your collections. For me, it's big brother of QUAD. I wish QUAD had same compression level with it's current speed.
BALZ
I really dislike this project since it was started. I never like it. As a note again, it's latest release is very slow and only benefits for highly redundant data. It's impractical for me.
I think, best thing for you to focus on PIMPLE/PIM or QUAD/LZPM. If you go on QUAD/LZPM side, you should keep the LZ nature (enough speed with accaptable compression level). We don't need highest compression for a LZ based compressor. But, you may add an extra option.
I hope, these comments could help you at choosing your way. Good luck
Hmm, tough question... I'd like to see some new pure CM inspired by PIMPLE and TC, and/or a new segmentation filter, this combination could easily be your new flagship![]()
I am... Black_Fox... my discontinued benchmark
"No one involved in computers would ever say that a certain amount of memory is enough for all time? I keep bumping into that silly quotation attributed to me that says 640K of memory is enough. There's never a citation; the quotation just floats like a rumor, repeated again and again." -- Bill Gates
i think the new BCM (BWT compression)
is a good step in the direction to become a star in the world of compression
from 001 to 002 it compresses better in less time
but i dont know: Is there a big unsolved problem on this program?
Stephan Busch wrotes:
"fails in compression of a special dataset with very redundant data,
on which some others fail as well
(p.ex. BBB by Matt Mahoney and also Florin Ghido's QLFC had problems).
"means in this case that it has been working for over 24 hours on that
dataset in sorting stage and didn't come to encoding of that certain block"
if it is an unsolvable problem for the program,
may be it would better to leave bcm
and do further improvements for balz / tc
or why not try to improve PIMPLE 2.0 ?
It may be fixed by adding an LZP-preprocessor or improving sorting algorithm.
Improving PIMPLE means complete rewrite - i.e. brand new CM coder. PIMPLE is too slow. Since then, I explored many new CM-related stuff: new arithmetic encoder; new and efficient fast counters; fast mixers; etc.
Having said that new CM may be as fast as current BCM at decompression, being MUCH stronger at compression. I noticed that LZ/BWT stuff seriously limits CM in compression.![]()
I believe that many people are waiting for open-source context mixing compressor (library) with a compression ratio of CCM/CMM/LPAQ and speed of CCM. It can be done by creating a new compressor or improving speed of LPAQ.
This compressor should take place of widely-used PPMd (e.g. WinRAR, WinZIP, XMLPPM, SCMPPM, XBzip).
Przemyslaw
I noticed that CCM is not pure CM compressor. It has an switch which turn on LZ layer for extra speed. I think, it's speed comes from here. Else, we cannot reach such that speed (1-3 MB/s on my laptop) with pure CM. toffer was written a pure CM which achieves ~1 MB/s speed. He does lots of speed optimization which can be. Both of them have automatic switch for turn on/off submodels for catching speed. I think, we can made a CM compressor which have a LZ submodel. I know match model does already similar thing. But, I mean totally different thing.
What do you mean with "brand new cm/lz"? Writing new compressors? I'm mostly interested in statistical compression.
@osmanturan: there are more things which can be done. Especially with higher orders.
AFAIK, CCM have order 0-4, 6 contexts, a match model and some spare models. Also, it have well designed filters: x86 and delta transform. Roughly, there is a difference between CCM and CMM: spare models and delta filters. Both of them have x86 transform. So, roughly everything is same.
Did you notice CCM and CMM have same speed on some files? These kind of files are not compressible well with dictionary based methods. CCM mainly optimized for common files. When compressing a common file, it achieve about ~3 MB/s. I think, this tell us it has a LZ layer.
I meant when there are only a few symbols under a high order context you are can use more space efficient memory structures, e.g. Store some tree like structure per byte to avoid hashing.
Are you sure about the ccm details? Christian sometimes said a bit about it, but never that much. High speeds on redundant data don't necessarily inlove LZ, it could be the effect of increasing cache efficiency (only a few context, compared to e.g. precompressed data).
What is the difference between a LZ submodel and a match model, in your terminilogy? Do you mean something like a bultin LZ preprocessor?
I've got some ideas to implement these things. You can also implement a "match model" without an external lz buffer, like storing some chains which contain additional hashes for collision detection.
Without a match model CMM is faster. With better data structures for higher orders, it will be faster too...
On data where CCM can't use its filters CMM outpreforms it in most cases (but not that drastically). Christian did a really good job.
I would suggest you to do this, with CM of course. But something non-paqish.
Yes, I know. It's surely helps.Originally Posted by toffer
Yes, I'm sure. Because, I have collected these information by reading all of his posts in this forumOriginally Posted by toffer
Note that, in the beginning of CCM, he talked about "LZ layer" couple of times.
We normally made a match model for predicting next bit. I describe a LZ submodel like that: under a certain conditions (such as long match, small LZ offset) totally switch off the other models. This kind of compressor becomes a LZ literal coder. But, there is a difference for me. We only use LZ layer under a certain conditions. Also, I think collecting statistics from matched phrase can also benefit for CM side. By doing this, we don't break predictions by LZ.Originally Posted by toffer
I'm sure that you always good ideasOriginally Posted by toffer
I want to see running version of them.
Surely a match model slow down a CM coder.Originally Posted by toffer
Yes, I meant that before. It's well designed for common files. He always said that, his filters only a few lines. But, detection algorithm is much more.Originally Posted by toffer
Edit: corrected spelling mistakes...
Last edited by osmanturan; 1st July 2008 at 14:42.
In last days I've tried to join LZP and LZ77 with CM:
1. I've joined flzp with lpaq8 (match lengths encoded using order 2 arithmetic coder). I've achieved about 10% faster compression, but 10% bigger files (the results are moderate as it depends on minimal match lenght).
2. I've also joined quicklz with lpaq8. Compression ratio was too bad.
10.595.869 bytes vs 10.886.984 bytesCode:cm@e051 ~/shared/projects/milestones-cmm/devel/cmm4_01f_080527/bin/Release $ time ./run_cmm4 .cmm4_43 43 CMM4 v0.1f by C. Mattern Jul 1 2008 Experimental file compressor. Init: Order6,4-0 context mixing coder. Allocated 118158 kB. Encoding: done. Ratio: 828527/842468 bytes (7.87 bpc) Speed: 192 kB/s (5068.4 ns/byte) Time: 4.27 s CMM4 v0.1f by C. Mattern Jul 1 2008 Experimental file compressor. Init: Order6,4-0 context mixing coder. Allocated 118158 kB. Applying x86 transform. Encoding: done. Ratio: 1184968/3870784 bytes (2.45 bpc) Speed: 235 kB/s (4149.0 ns/byte) Time: 16.06 s CMM4 v0.1f by C. Mattern Jul 1 2008 Experimental file compressor. Init: Order6,4-0 context mixing coder. Allocated 118158 kB. Encoding: done. Ratio: 451105/4067439 bytes (0.89 bpc) Speed: 276 kB/s (3525.6 ns/byte) Time: 14.34 s CMM4 v0.1f by C. Mattern Jul 1 2008 Experimental file compressor. Init: Order6,4-0 context mixing coder. Allocated 118158 kB. Encoding: done. Ratio: 3650013/4526946 bytes (6.45 bpc) Speed: 206 kB/s (4738.3 ns/byte) Time: 21.45 s CMM4 v0.1f by C. Mattern Jul 1 2008 Experimental file compressor. Init: Order6,4-0 context mixing coder. Allocated 118158 kB. Encoding: done. Ratio: 426899/20617071 bytes (0.17 bpc) Speed: 293 kB/s (3327.8 ns/byte) Time: 68.61 s CMM4 v0.1f by C. Mattern Jul 1 2008 Experimental file compressor. Init: Order6,4-0 context mixing coder. Allocated 118158 kB. Applying x86 transform. Encoding: done. Ratio: 1596934/3782416 bytes (3.38 bpc) Speed: 219 kB/s (4444.2 ns/byte) Time: 16.81 s CMM4 v0.1f by C. Mattern Jul 1 2008 Experimental file compressor. Init: Order6,4-0 context mixing coder. Allocated 118158 kB. Encoding: done. Ratio: 748289/4168192 bytes (1.44 bpc) Speed: 268 kB/s (3632.3 ns/byte) Time: 15.14 s CMM4 v0.1f by C. Mattern Jul 1 2008 Experimental file compressor. Init: Order6,4-0 context mixing coder. Allocated 118158 kB. Encoding: done. Ratio: 742157/4149414 bytes (1.43 bpc) Speed: 270 kB/s (3612.6 ns/byte) Time: 14.99 s CMM4 v0.1f by C. Mattern Jul 1 2008 Experimental file compressor. Init: Order6,4-0 context mixing coder. Allocated 118158 kB. Encoding: done. Ratio: 511397/4121418 bytes (0.99 bpc) Speed: 269 kB/s (3622.5 ns/byte) Time: 14.93 s CMM4 v0.1f by C. Mattern Jul 1 2008 Experimental file compressor. Init: Order6,4-0 context mixing coder. Allocated 118158 kB. Encoding: done. Ratio: 455580/2988578 bytes (1.22 bpc) Speed: 262 kB/s (3717.5 ns/byte) Time: 11.11 s real 3m21.906s user 3m19.324s sys 0m1.596s cm@e051 ~/shared/projects/milestones-cmm/devel/cmm4_01f_080527/bin/Release $ ls -l /tmp/*.cmm4_43 -rw-r--r-- 1 cm cm 828527 2008-07-01 13:11 /tmp/A10.jpg.cmm4_43 -rw-r--r-- 1 cm cm 1184968 2008-07-01 13:11 /tmp/AcroRd32.exe.cmm4_43 -rw-r--r-- 1 cm cm 451105 2008-07-01 13:11 /tmp/english.dic.cmm4_43 -rw-r--r-- 1 cm cm 3650013 2008-07-01 13:11 /tmp/FlashMX.pdf.cmm4_43 -rw-r--r-- 1 cm cm 426899 2008-07-01 13:13 /tmp/FP.LOG.cmm4_43 -rw-r--r-- 1 cm cm 1596934 2008-07-01 13:13 /tmp/MSO97.DLL.cmm4_43 -rw-r--r-- 1 cm cm 748289 2008-07-01 13:13 /tmp/ohs.doc.cmm4_43 -rw-r--r-- 1 cm cm 742157 2008-07-01 13:13 /tmp/rafale.bmp.cmm4_43 -rw-r--r-- 1 cm cm 511397 2008-07-01 13:14 /tmp/vcfiu.hlp.cmm4_43 -rw-r--r-- 1 cm cm 455580 2008-07-01 13:14 /tmp/world95.txt.cmm4_43 cm@e051 ~/shared/projects/milestones-cmm/devel/cmm4_01f_080527/bin/Release $ time ./run_cmm4 .cmm4_43_no_mm 43 CMM4 v0.1f by C. Mattern Jul 1 2008 Experimental file compressor. Init: Order6,4-0 context mixing coder. Allocated 118158 kB. Encoding: done. Ratio: 828442/842468 bytes (7.87 bpc) Speed: 199 kB/s (4902.3 ns/byte) Time: 4.13 s CMM4 v0.1f by C. Mattern Jul 1 2008 Experimental file compressor. Init: Order6,4-0 context mixing coder. Allocated 118158 kB. Applying x86 transform. Encoding: done. Ratio: 1207298/3870784 bytes (2.50 bpc) Speed: 253 kB/s (3849.3 ns/byte) Time: 14.90 s CMM4 v0.1f by C. Mattern Jul 1 2008 Experimental file compressor. Init: Order6,4-0 context mixing coder. Allocated 118158 kB. Encoding: done. Ratio: 463227/4067439 bytes (0.91 bpc) Speed: 308 kB/s (3169.1 ns/byte) Time: 12.89 s CMM4 v0.1f by C. Mattern Jul 1 2008 Experimental file compressor. Init: Order6,4-0 context mixing coder. Allocated 118158 kB. Encoding: done. Ratio: 3660702/4526946 bytes (6.47 bpc) Speed: 214 kB/s (4546.1 ns/byte) Time: 20.58 s CMM4 v0.1f by C. Mattern Jul 1 2008 Experimental file compressor. Init: Order6,4-0 context mixing coder. Allocated 118158 kB. Encoding: done. Ratio: 512532/20617071 bytes (0.20 bpc) Speed: 315 kB/s (3095.5 ns/byte) Time: 63.82 s CMM4 v0.1f by C. Mattern Jul 1 2008 Experimental file compressor. Init: Order6,4-0 context mixing coder. Allocated 118158 kB. Applying x86 transform. Encoding: done. Ratio: 1618920/3782416 bytes (3.42 bpc) Speed: 235 kB/s (4153.4 ns/byte) Time: 15.71 s CMM4 v0.1f by C. Mattern Jul 1 2008 Experimental file compressor. Init: Order6,4-0 context mixing coder. Allocated 118158 kB. Encoding: done. Ratio: 783433/4168192 bytes (1.50 bpc) Speed: 283 kB/s (3447.5 ns/byte) Time: 14.37 s CMM4 v0.1f by C. Mattern Jul 1 2008 Experimental file compressor. Init: Order6,4-0 context mixing coder. Allocated 118158 kB. Encoding: done. Ratio: 742851/4149414 bytes (1.43 bpc) Speed: 301 kB/s (3239.0 ns/byte) Time: 13.44 s CMM4 v0.1f by C. Mattern Jul 1 2008 Experimental file compressor. Init: Order6,4-0 context mixing coder. Allocated 118158 kB. Encoding: done. Ratio: 576707/4121418 bytes (1.12 bpc) Speed: 295 kB/s (3307.1 ns/byte) Time: 13.63 s CMM4 v0.1f by C. Mattern Jul 1 2008 Experimental file compressor. Init: Order6,4-0 context mixing coder. Allocated 118158 kB. Encoding: done. Ratio: 492872/2988578 bytes (1.32 bpc) Speed: 285 kB/s (3419.7 ns/byte) Time: 10.22 s real 3m7.444s user 3m5.196s sys 0m1.728s cm@e051 ~/shared/projects/milestones-cmm/devel/cmm4_01f_080527/bin/Release $ ls -l /tmp/*.cmm4_43_no_mm -rw-r--r-- 1 cm cm 828442 2008-07-01 13:18 /tmp/A10.jpg.cmm4_43_no_mm -rw-r--r-- 1 cm cm 1207298 2008-07-01 13:18 /tmp/AcroRd32.exe.cmm4_43_no_mm -rw-r--r-- 1 cm cm 463227 2008-07-01 13:19 /tmp/english.dic.cmm4_43_no_mm -rw-r--r-- 1 cm cm 3660702 2008-07-01 13:19 /tmp/FlashMX.pdf.cmm4_43_no_mm -rw-r--r-- 1 cm cm 512532 2008-07-01 13:20 /tmp/FP.LOG.cmm4_43_no_mm -rw-r--r-- 1 cm cm 1618920 2008-07-01 13:20 /tmp/MSO97.DLL.cmm4_43_no_mm -rw-r--r-- 1 cm cm 783433 2008-07-01 13:20 /tmp/ohs.doc.cmm4_43_no_mm -rw-r--r-- 1 cm cm 742851 2008-07-01 13:21 /tmp/rafale.bmp.cmm4_43_no_mm -rw-r--r-- 1 cm cm 576707 2008-07-01 13:21 /tmp/vcfiu.hlp.cmm4_43_no_mm -rw-r--r-- 1 cm cm 492872 2008-07-01 13:21 /tmp/world95.txt.cmm4_43_no_mm
real 3m21.906s
user 3m19.324s
sys 0m1.596s
vs
real 3m7.444s
user 3m5.196s
sys 0m1.728s
Note that CMM would be faster, i only converted out the match finding (most time consuming), since other things would require more work (joined some things in a tricky way).
Bytewise LZP is bad here, since the length codes interfer with the original alphabet. That means your context is destroyed right after a match. It would be better to use a string substitution (a byte maps to a unique, frequently appearing string). The mapping could slowly adapt. That way you virtually increase the context, while lowering the file size.
BTW: cmm1 is a CM/LZP hybrid.
Tips:
It is possible to create a assymetric CM?
Like 192 mb for compression and 32 mb for descompression?
Only non symmetric compressors have chance in practical compression
1536 mb for compression and 1500 for descompression it's not a good idea
Use CM carefully.
The compressor also filters non redundant data?
Compression time is not so important.
Descompression time is very important.
Last edited by lunaris; 1st July 2008 at 16:23.
> It is possible to create a assymetric CM?
> Like 192 mb for compression and 32 mb for descompression?
Yes, it's possible to make a compressor which have different speed for it's compressing and decompressing stages. But, I'm not sure about memory usage.
> Only non symetric compressors have chance in practical compression
I don't think so. Total time is important for me: compressing+transmitting+decompressing. Some asymetric compressors have very time consuming compressing algorithm.
> 1536 mb for compression anda 1500 for descompression it's not a good idea
Memory usage surely helps for statistical compressors i.e PPM and CM. But, a well designed statistical compressor is generally sufficient under 512 MB memory usage. Note that, most of people has at least 256 MB RAM. BTW, why nobody uses file mapped memory pointers as a fallback? Windows natively support it.
> Use CM carefully.
Surely
> The compressor also filters non redundant data?
I think, a universal compressor could take care of different file structures rather than generic filter algorithms.
> Compression time is not so important.
> Descompression time is very important.
You are surely a lover of LZ based compressor![]()
I already presented a basic approach to asymmetric CM... Search this form for M01a
> It is possible to create a assymetric CM?
> Like 192 mb for compression and 32 mb for descompression?
>Yes, it's possible to make a compressor which have different speed for it's >compressing and decompressing stages. But, I'm not sure about memory >usage.
I don know about M01 from toffer.But he explains some ideas about asymmetric CM. It's gpl v3 source code.
> Only non symetric compressors have chance in practical compression
>I don't think so. Total time is important for me: >compressing+transmitting+decompressing. Some asymetric compressors >have very time consuming compressing algorithm.
Yes, it's important,but most of projects which uses compression (like package distribution ) does not like agressive compressors
Even PPM and DMC is not adopted in actual projects.
The main problem?
Symmetric algorithms and not so optimized.
encode.su is a very rare forum where people develops such algorithms.
> 1536 mb for compression anda 1500 for descompression it's not a good idea
>Memory usage surely helps for statistical compressors i.e PPM and CM. But, a well designed statistical compressor is generally sufficient under 512 MB memory usage. Note that, most of people has at least 256 MB RAM. BTW, why nobody uses file mapped memory pointers as a fallback? Windows natively support it.
But 512 MB is very high and is very limited .The compressor must offer a lot of options like freearc. Symmetric and Asymmetric with option of memory usage.
It is possible use BCM to do this.
> Use CM carefully.
>Surely
> The compressor also filters non redundant data?
>I think, a universal compressor could take care of different file structures >rather than generic filter algorithms.
Very hard to do this and requires a lot of time.Sometimes is more practical to use some filters..
> Compression time is not so important.
> Descompression time is very important.
>You are surely a lover of LZ based compressor
No ,it's not, the lz family have a lot of algorithms.
Look at DMC,PPM CM and SR.
Last edited by lunaris; 1st July 2008 at 16:50.
> I don know about M01 from toffer.But he explains some ideas about
> asymmetric CM. It's gpl v3 source code.
Yes, it's a good example about asymmetric CM. Also, there is asymmetric binary coder which can be used instead of arithmetic coder. AFAIK, it's a bit slower than actual arithmetic coder nowadays. Still in development...
> Yes, it's important,but most of projects which uses compression (like
> package distribution ) does not like agressive compressors
My laptop is enough fast (Core2 Duo 2.2GHz, 2 GB RAM). But, it's raw writing speed is around 20-25 MB/s. So, practically I don't need a decompressor which have >20 MB/s in decompressing. About at least 1 MB/sec decompressing speed with strong compressing is sufficient at most of time for me.
> Even PPM and DMC is not adopted in actual projects.
Did you know that WinRAR use PPM for text based files. Also, WinZip 11 has an option for PPM in ZIP archives.
> The main problem?
Lazy developers
> Symmetric algorithms and not so optimized.
Symmetric statistical algorithms is easier than asymmetric statistical algorithms. That's the reason why we see lots of symmetric compressor.
> encode.su is a very rare forum where people develops such algorithms.
Think of about WinRAR and WinZip 11 PPM usage again
> But 512 MB is very high and is very limited .The compressor must offer a lot
> of options like freearc. Symmetric and Asymmetric with option of memory
> usage.
If I can extract a compressed file under 512 MB RAM usage, there is no problem for me. If I had a 256 MB, I would accept slow decompressing due to file mapped memory at this level. For me 256 MB RAM very insufficient for users on WinXP. I haven't mentioned Vista which approximately needs 1 GB RAM yet.
> It is possible use BCM to do this.
BWT's nature it's a bit hard to improve without improving it's sorting algorithm.
> No ,it's not, the lz family have a lot of algorithms.
> Look at DMC,PPM CM and SR.
I knew them already. Thanks anyway.
Well ,maybe you are right.Actually,there are a lot of Asymmetric compressors which people can choice like 7zip,winzip,winrar,freearc.
Maybe encode can focus in high compression.Specially for large packages and a lot of mixed data.
Although , the compressor probably stay restricted for compression lovers.
But remember ,use CM carefully.
Last edited by lunaris; 1st July 2008 at 17:36.
Already done. As I wrote several posts before:
I've joined flzp with lpaq8 (match lengths encoded using order 2 arithmetic coder). I've achieved about 10% faster compression, but 10% bigger files (the results are moderate as it depends on minimal match lenght).
It 's not so good. I've received similar or better results disabling 2 APM stages in lpaq.Originally Posted by toffer
I agree. There is no need to think about asymmetric compression if you can improve speed of symmetric compression/decompression (with the same compression ratio).Originally Posted by encode