Hello, I'd like to show you my today's toy.
It originated from RLE64 where I found and fixed a couple of suboptimalities. Then I unrolled it. And that's pretty much it.
Different variants are optimal on different machines, I may study this further. What I present below is the best so far on the current one, so it's a kind of a cheat. Still, it should nicely beat the original any time anywhere.
Also, to reach high speeds you need files that fit in the cache. Memory bandwidth is a limit.
Results:
Code:
pcbsd-8973% please ./fsbench -i10 memcpy memmove memset lrrle lrrle,192 lrrle,128 lrrle,64 rle64 lz4 ~/bench/scc-256K/mr
Codec version args
C.Size (C.Ratio) E.Speed D.Speed E.Eff. D.Eff.
memcpy 0
262144 (x 1.000) 6876 MB/s - 0e0 0e0
memmove 0
262144 (x 1.000) 6873 MB/s - 0e0 0e0
memset 0
262144 (x 1.000) 13.7 GB/s - 0e0 0e0
lrrle 0 256
213392 (x 1.228) 7887 MB/s 11.0 GB/s 1466e6 2086e6
lrrle 0 192
211032 (x 1.242) 6488 MB/s 10.00 GB/s 1264e6 1996e6
lrrle 0 128
206976 (x 1.267) 5554 MB/s 9484 MB/s 1168e6 1996e6
lrrle 0 64
202696 (x 1.293) 3375 MB/s 6308 MB/s 765e6 1430e6
RLE64 R3.00 64
202920 (x 1.292) 2455 MB/s 5251 MB/s 554e6 1186e6
LZ4 r114
144177 (x 1.818) 386 MB/s 820 MB/s 173e6 369e6
Codec version args
C.Size (C.Ratio) E.Speed D.Speed E.Eff. D.Eff.
done... (10*X*1) iteration(s)).
The source is in the fsbench tree:
https://chiselapp.com/user/Justin_be.../dir?type=tree