@Shelwien, basically, yes, a more complex tuning can be done. However, the more complex the tuning will get, the more decompressor-specific it is likely to get. I was hoping to see if simpler, more generic heuristics can be used to more-or-less uniformly affect the resulting decompression times. For example, we could see that changing minimum match length had a very significant effect on decompression speed, so a possible heuristic recommendation for faster compressor could be to avoid matches that are too short.
@stbrumme, I am targeting 8-bit platforms and my decompressor is written in Z80 assembly. The numbers I gave are actually the numbers of Z80 clocks measured in a specialized emulator.
For all intents and purposes, they are exact times; there are no hardware interferences to overcomplicate things. Effectively, every combination of token parameters corresponds to an exact time, which for a specific file are all added up and measured by the emulator. How precisely this corresponds to the performance on real hardware depends on a hardware. On many ZX Spectrum clones the actual run times would be exactly as in the emulator. On some original models the performance will depend on which memory is getting used, so it may not be 100% precise there. Of course, when you are on a PC you get affected by cache misses, branch mispredictions and other similar complications which make similar precise timing not possible. This is why I really wanted to know if simply reducing the number of tokens would be percievable during decompression on a PC.
I originally struggled to reproduce stbrumme's results; however, after ensuring that all my executables have correct versions and command line arguments, I am seeing the following:
Code:
compressed no. of tokens
lz4ultra (ver.1.0.0) 41,928,203 10,826,799 (9.23634 bytes/token)
lz4ultra (ver.1.0.1) 41,927,804 10,825,609 (9,23736 bytes/token)
lz4ultra -B7 (ver.1.0.2) 41,913,406 10,824,901 (9.23796 bytes/token)
lz4 -19 -BD --no-frame-crc (ver.1.8.2) 41,913,377 11,025,738 (9.06969 bytes/token)
lz4 -19 -BD --no-frame-crc (ver.1.8.3) 41,913,377 11,025,738 (9.06969 bytes/token) - basically, identical to 1.8.2
lz4 -19 -BD --no-frame-crc --favor-decSpeed (ver.1.8.3)
42,298,381 10,547,798 (9.48065 bytes/token)
smallz4 -9 (ver.1.3) 41,913,367 10,753,325 (9.29945 bytes/token)
I am unsure why lz4ultra produces quite as many tokens on enwik8 (in my tests on small files <64K, lz4ultra tends to produce much fewer tokens than smallz4). We will be looking into this together with emarty.
How does smallz4 ends up with a preference for fewer tokens? Do you already have a mechanism for consistent resolution of ties in place?