Hello everyone,

Nice to hear from you again and happy new year ! I've been off to other projects but I finally have some time to spend on my compression library project hence this post !

As a *DISCLAIMER* I'm the author of sharc/density but it doesnotmean that the following benchmarks or results are inaccurate.

First a quick presentation for anyone interested, all the software here is open source :

sharcis a simple command-line archiver, GPL, C99 compatible which serves as a frontend to density (https://github.com/centaurean/sharc)

densityis the compression library, BSD licensed and C99 compatible as well (https://github.com/centaurean/density)

- its aim is maximum efficiency, or speed combined with compression ratio, I'll discuss this later
- the 2 algorithms currently available in density (more coming) look simple on the surface (dictionary based for one, shifting dictionaries and predictions for the other) but sometimes simple algos make wonders
- a lot of craft was put into code optimization for absolute speed
- it is fully streamable
- it has a very simple to use API

I know everyone's curious about benchmarks (me included) so here we go, compared to LZO and LZ4 which are the most similar compression algorithms I can think of in terms of performance :

BenchmarkPlatform : Macbook Pro, OSX 10.10.1, 2.3 GHz Intel Core i7, 8Go 1600 MHz DDR, SSD

All programs were compiled with their standard options (standard makefile) using clang (note that I also tried with gcc-4.9 and it makes no difference whatsoever). They were run a minimum of 5 times and chronometered using the best "User" value from the "time" program. They are of course all single-threaded otherwise this would make no sense.

The file used was the world famous enwik8 but any compressible data would do !

A measure of efficiency ?What is a good measure of efficiency ? that's a tough question... one thing is for sure though, for a compression library there are only two measurables which are speed and ratio.

If, let's say, speed and ratio are of equal importance, we could use the following formula : efficiency = (1/ratio)/roundtrip-time, here are the results:

Program | Efficiency (1), higher is better

sharc -c1 | 7,63

sharc -c2 | 4,20

lz4 -1 | 3,08

lz4 -9 | 0,56

lzop -1 | 2,61

lzop -9 | 0,17

But of course everyone will agree that usually the difficulty of having a better ratio grows much quicker than the difficulty of having a faster algorithm. In an extreme case, let's say that the ratio is massively more useful for us than the speed.

A good measure of efficiency in that case would be something like : efficiency = (1/(ratio ^ 10))/roundtrip-time, yes that's ratio elevated at a power of 10 so we are pretty sure it is REALLY more important. The table is now :

Program | Efficiency (10), higher is better

sharc -c1 | 604,07

sharc -c2 | 1239,01

lz4 -1 | 484,98

lz4 -9 | 865,78

lzop -1 | 430,06

lzop -9 | 483,53

The result values jump through the roof because of the powers, so let's just express them as a percentage of the sum of all values, and let's walk through ratio powers of 1 to 10, this is what we get :

lower axis : ratio powers

Thank youfor reading, and let me know what you think !