While working withemartyon another project, which I will write about separately, we noticed that we could fairly substantially improve the decompression speed by strategically resolving ties. More specifically,emartywrote a byte-aligned compressor similar in style to LZ4. Its nearly optimal parser deals with a fairly substantial number of ties, i.e. variations of encoding in the compressed data that do not affect the size of the resulting compressed data. Our analysis of the data format showed that decompression of literals tends to require less processing and this suggested that a preference to encode literals in the compressor can improve decompression speeds. Working with a particular decompressor we managed to get an overall improvement of decompression speed on the order of 5-7%.

Of course, this involved working with a particular decompressor. If we were to analyse every code path of our decompressor and identified specific costs associated with every one of them, we could have probably achieved even more win on decompression speed. However, we did not do that. Instead,emartycame up with a useful heuristic: the average length of groups of literals and matches inversely correlates with the number of used compression tokens, so he designed an algorithm to resolve ties in a way that simply reduces the number of tokens. To see how useful this can beemartywrote his version of nearly optimal compressor for LZ4 that uses this method of tie resolution. His work is available here: https://github.com/emmanuel-marty/lz4ultra

To do a fair testing, I tried compressing a bunch of small files (all files < 64K, see more about my corpus here: https://encode.su/threads/3001-State...ll=1#post58588 ) using several available optimal compressors for LZ4 and measured the precise decompression speed for each of them using my speed-optimized decompressor. This is what I observed:

(the units of time are Z80 t-states, the less, the better).Code:compression decompression ratio, % time per byte lz4 -19 (ver.1.8.3) 58.47% 32.81 lz4 -19 --favor-decSpeed (ver.1.8.3) 59.26% 32.59 lz4x -9 (ver.1.60) 58.47% 32.58 smallz4 -9 (ver.1.3) 58.47% 32.58 lz4ultra (ver.1.0.2) 58.50% 32.33

So, these are my observations about the results of these tests:

- LZ4 does not seem to do tie resolution in a consistent fashion.
- LZ4X and Smallz4 generate data that somehow have some partial tie resolution, so that they are decompressed 0.7% faster with my decompressor.
ematry's compressor LZ4Ultra generates very slightly larger files than optimal (very close to LZ4 -19). However, the data compressed using LZ4Ultra are about 1.5% faster to decompress in our tests.- Interestingly, if we attempt using the LZ4 option --favor-decSpeed, we get similar number of tokens to LZ4X or Smallz4, but significantly lower compression ratio.

Is this win substantial? I am guessing the answer depends on your applications, but, quite importantly, it comes "free", in the sense that it does not require you to compromize on compression ratio. It would be interesting to know what you think about this style of optimizing compressed data, would you consider doing something similar in your project (and if not - why not?). It would also be very interesting to know if you could do some precise measurements of decompression times for data packed using these optimal compressors, as well as emarty's compressor, to see how much of a difference this makes for decompressors different from mine.