First of all blosc is claiming to be faster than memcpy.
This is impossible, simply because blosc is no more
than a shuffle (or transpose) + lz77.
Only the shuffle alone cannot be faster than memcpy, because
shuffle is a blocked memcpy with zero compression.
The Shuffle is applied before compression and after decompression.
A multithread blosc muss be also compared to a multithreaded memcpy.
We can see from the benchmarks at TurboPFor that "Integer Compression" is superior to "binary compression"
like blosc in terms of compression ratio and speed.
Blosc is tested w. lz4 as compressor and vectorized shuffle.
Another advantage of Integer Compression is direct access to individual values without decompressing entire blocks.