English plaintext compression task.
1. Public dataset:
- we have to add decompressor size to compressed size;
- encourages compiler tweaks, discourages speed optimization via inlining/unrolling (lzma.exe.7z: 51,165; zstd.exe.7z: 383,490)
- encourages overtuning (participants can tune their entries to specific data);
- embedded dictionaries are essentially blocked;
2. Private dataset:
- decompressor size has to be limited for technical reasons (<16MB?)
- embedded dictionaries can be used fairly
- embedded dictionaries actually improve compression results (see brotli vs zstd benchmarks)
- we can post hashes of private files in advance, then post the files after end of contest
Its for an actual contest that is being prepared. Please vote.