Despite not having finished many things that I planned for the next release, I decided to push it earlier because Shrinker turned out to be a cool thing.
First, some numbers:
Code:
e:\projects\benchmark04\tst>fsbench default shrinker -i3 -s1 -b131072 -m4096 -t2
..\scc.tar
memcpy = 78 ms (2591 MB/s), 211927552->211927552
Codec version args Size (Ratio) C.Speed D.Speed
LZ4 r59 12 106569728 (x 1.99) C: 185 MB/s D: 598 MB/s
LZO 2.05 1x1 104497152 (x 2.03) C: 161 MB/s D: 256 MB/s
QuickLZ 1.5.1b6 1 100093952 (x 2.12) C: 153 MB/s D: 132 MB/s
Snappy 1.0.5 108097536 (x 1.96) C: 89 MB/s D: 304 MB/s
Shrinker r4 98582528 (x 2.15) C: 88 MB/s D: 309 MB/s
Codec version args Size (Ratio) C.Speed D.Speed
done... (3x1 iteration(s))
In most tests it's quite similar, I consider it to be the strongest of fast codecs.
There are issues though, it doesn't work correctly on my 32-bit Linux. It's quite possible that id doesn't work correctly on 32-bit systems at all.
I made another noteworthy addition, LZX. Sadly, lzx_compress doesn't come with a decompressor. I want to add one from some other library later, for now there's just a dummy. Don't be surprised by how fast it is. 
As to internal changes:
Multithreading code is much different, now I schedule work dynamically instead of statically. The implementation is still very basic, I use mutexes and didn't try critical sections or custom solutions. It has to wait. Also, I don't have good job size adjustment, I use a default of 256 KB; for now it has to do.
I still haven't decided what to do with measured overhead of my code, right now I do nothing; later I may subtract it from the results.
And, sadly, I have to say that I don't have much confidence in correctness of the code, I don't have a feeling that it's tested well already. Well, this is the thing that makes me the least content about releasing the code now.
To enable the multithreading changes, I had to modify memory layout. The new code is, in general, safer. But less cache-friendly with things scattered over memory instead of having just a couple of big chunks. I noted performance drop in some codecs and I suspect this might be the issue.
I changed the way decompression speed is calculated, now I take into account only amount of data that was actually decompressed. While I consider it to be a less important metric, it's less confusing.
Changelog:
Code:
[+] added LZFX
[+] added LZX (without decompressor)
[+] added Shrinker
[-] removed lzp1. It crashed on calgary.tar created with 7-zip 9.22. After a short analysis I decided that debugging would cost too much to be worth it.
[~] major refactoring and other improvements
[~] dynamic work scheduling. Up to now threads got roughly equally sized pieces. Sometimes one piece would take much longer than others thus skewing results in favor of codecs with more predictable performance.
[~] added QuickLZ to the list of default codecs
[~] use memcpy instead of LZ4 for warmup. It touches more memory, which is good for fairness with codecs that are weaker than LZ4.
[~] when calculating decompression speed, take into account amount of data that was really decompressed
[~] improved measurement accuracy
[!] the algorithm often considered an incompressible last block of a file to be compressible
[!] *nix compatibility fixes
[!] fixed UCL, LZO and LZMAT crashes
Attached, source and mingw win64 compilation.