Hi everyone,
I've decided to develop a new archive format: Bit Archive Format. The design goals are:
1-Store "only" suffience data which needed by decompressing process. File timestamps/flags, encryption, large file support (>2GB), unicode characters, archive splitting etc. not necessary. We only need filename, file size, compressor identity and a data validation hash key (can be CRC32).
2-Optimize archive format for DVD like media. The decompressor should read the archive by sequential access (not required any seeking). Also, file list should be fetched very quickly with squential access (ZIP like approach is not acceptable).
3-Very extreme conditions "are not" a problem for the compression process. It can eat too large memory and too much CPU horsepower. But, the decompression process is very sensitive. It must not be very slow. The memory usage can be reach 100-150mb at decompression process. On the other hand, the compression should be better than ZIP at most cases.
4-Patent issues are very big problem. So, arithmetic coding is not acceptable for entropy coding process. Range encoder is more suitable. The code itself can be open source or not (this is not very clear at this time)
5-Executable, JPEG, Wave etc. prefilters "are not" necessary.
A careful reader can guess why this kind format is needed. The answer is simple: for game development. A game or a simulation software will store all of the data such as collision maps, textures, videos, audio files in this format. Notice that, most files are in binary format.
I think, ROLZ+Context Mixing+Range Coding based approach is the best choice at compression schema side. Because, ROLZ is highly asymmetric (slow compression / high decompression speed) and effective compression, Context Mixing is "very" effective compression, Range Coding is fast and patent free.
QUAD seems very good place to start. I like it's fast and effective implementation. But, I would like to add context mixing instead of PPM. So, PAQ seems very bright at this point. But, new versions use extremely large memory. Also, they are highly symmetric. The decompression time can be very long
I would like to thank Ilia Muraviev and Matt Mahoney for their excellent works.
Do you have any idea about this work?
P.S. : I'm not an expert at compression algorithms.
Osman Turan