Results 1 to 13 of 13

Thread: Compressor benchmark : trying a new presentation

  1. #1
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    863
    Thanks
    461
    Thanked 257 Times in 105 Posts

    Compressor benchmark : trying a new presentation

    Hi

    I'm currently interested in trying a new compressor comparison diagram, which would be able to use Speed as much as compression strength to provide some useful comparison result.

    As already advertised in an earlier post by Shelwien, speed is only valuable as much as you need it. Over a 50KB/s pipe, it's just plainly useless to compress at 100MB/s; more compression power is clearly the better choice.

    Now, depending on usage, maybe 50KB/s is not a correct compression target. As another example, one could be willing to compress as fast as its HDD can sustain it, which probably means something around 30-50MB/s for a mechanical hard drive, and much higher for an SSD. In between, you've got many LAN file transmission scenario, and fast Internet ones. In fact, who knows which speed interests you ?

    As an attempt to answer this issue, i've tried to build a scenario, in which a file is compressed, then sent over a variable-speed pipe, and then decompressed. Times are added, and compared for different pipe speeds. Instead of representing raw figures, only the relative position of compression programs are considered, with the fastest alternative always at the top of the chart for a given speed.

    As a fist attempt to use this strategy, it gives the following diagram, based on figures published by CompressionRatings public benchmark :



    Obviously, one could say that this "file transmission scenario" is no "one size fits all". For example, in a distribution strategy, compression time is (mostly) negligible since it is offline. Overlapped mode are not considered. I'm sure many other examples can be found.
    At least, this is an attempt at using Compression & Decompression Speed and Compression Ratio information to produce a hopefully useful comparison chart.

    Ideas and comments are very welcomed.

    Extracted from :
    http://fastcompression.blogspot.com/...w-ranking.html
    Last edited by Cyan; 6th April 2011 at 00:15.

  2. #2
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    I suggest:
    1. putting it in a spreadsheet or an applet and letting users vary parameters: encoding CPU speed, decoding CPU speed, pipe speed.
    2. You could also add data size and pipe latency.
    3. Using not a single speed measure but weighted average over several samples, with user capable of choosing weights.

    ADDED:
    4. Replacing the image on your blog with a narrower one, it looks terrible to have blog archives drawn over it.
    5. I find graphs like this:

    more intuitive. The one you propose takes a while getting used to, but offers superior readability.
    6. Would be good if you gave exact formulas that you use to produce it.
    Last edited by m^2; 6th April 2011 at 00:38.

  3. #3
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    863
    Thanks
    461
    Thanked 257 Times in 105 Posts
    Thanks for suggestion m^2.
    Indeed, the proposed ranking graph may not be properly understood without visually experimenting with the shortcomings of the more direct time/speed representation.
    I've therefore updated the benchmark page with both graphs, since the first one seems necessary to understand the second one.

    Would be good if you gave exact formulas that you use to produce it.
    For the white graph, it's an obvious Compression Time + Transmission Time + Decompression Time.
    For the dark graph, we use the same data set, but then results are clamped. The fastest compression program gets the note "100", and all other ones get a lower note, depending on their speed difference. There is a magnification factor. I tried several of them, and imho all are good, since none change the ranking. It's just a way to make differences more visible.

  4. #4
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Thanks for the answer.
    Quote Originally Posted by Cyan View Post
    For the white graph, it's an obvious Compression Time + Transmission Time + Decompression Time.
    For the dark graph, we use the same data set, but then results are clamped. The fastest compression program gets the note "100", and all other ones get a lower note, depending on their speed difference.
    I guess I was tired yesterday...that's what I thought, but then I got confused by the lines that in some places step down from 100, now I see that it's just how you interpolate a curve from discrete data points.
    Quote Originally Posted by Cyan View Post
    There is a magnification factor. I tried several of them, and imho all are good, since none change the ranking. It's just a way to make differences more visible.
    Mhm.

  5. #5
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,372
    Thanks
    213
    Thanked 1,020 Times in 541 Posts
    @Cyan: Cool graph and I like the adaptive scaling idea.
    But extrapolations are imho wrong.
    The benchmarks that show things like "this program would be the best if it had all necessary parts" are no good,
    except for authors of these "winning" programs.
    Why don't you compare them in memory instead? (like inikep did).
    Or how about running 4 instances in parallel? (so that even single-threaded codecs would fully use the cpu)
    Also it looked like you're still using some virtual "process times" instead of wall time, though I may be wrong about that.
    Anyway, stats that don't correspond to any physical measurements are not very useful - if a program uses 10s of real time
    and 1s of "process time", it doesn't mean that it would still take 10s with 90% cpu load by other processes.

    Another fairily suspicious thing is tcp speed. Its complicated and I don't really believe in 300MB/s tcp transfers (with common OS).
    How about making a test with a real tcp connection (via localhost at least, though ideally it should be done with 2 machines).

  6. #6
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    863
    Thanks
    461
    Thanked 257 Times in 105 Posts
    Source benchmark data directly come from CompressionRatings.
    More specifically from this page :
    http://compressionratings.com/sort.c..._sum.full+15nr

    I could have created my own benchmark to generate another set of data, this would not make these data more reliable however.

    I like the CR results since the benchmark strategy is quite clear and consistent. It is also relatively precise for speeds, and the benchmark corpus respectably diverse.
    It has its shortcomings though on very high speeds, since it depends on RAM Drive speed, but very few programs so far are able to beat RAM Drive speeds. And it only matters for very high speeds (>100MB/s).

    For such speeds, I agree that "in memory" tests would be more interesting, such as the excellent work from inikep. This is however, quite more complex to setup, and not generalizable to all compressors. It would also create 2 different sets of data, therefore no "unified" presentation methodology.

    Mhm.
    Something wrong, m^2 ?

  7. #7
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by Cyan View Post
    Something wrong, m^2 ?
    No, I just acknowledge that I noticed and understood; that it's OK and that I have nothing special to comment about the thing.
    I guess I compressed the statement too much.

  8. #8
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,372
    Thanks
    213
    Thanked 1,020 Times in 541 Posts
    > Source benchmark data directly come from CompressionRatings.

    I don't like the idea. There're many quirks, especially when
    comparing qpress vs freearc etc (for qpress Sami adds tar time afaik).
    Also afaik its still process time, nobody knows how estimated.

    > I could have created my own benchmark to generate another set of
    > data, this would not make these data more reliable however.

    It might actually. Sami is too smart in a way - I can't predict all
    the things which he could do there.
    Even plain enwik9 test with wall times would make more sense afaik.

    > It has its shortcomings though on very high speeds, since it depends
    > on RAM Drive speed, but very few programs so far are able to beat
    > RAM Drive speeds.

    I guess it depends on machine/ramdrive software/benchmark scripts.
    I just tried running "timetest copy /b enwik9 enwik9a" here
    ( Q9450 @ 3.52ghz, http://www.superspeed.com/desktop/ramdisk.php )
    and got 1.078s for it.
    (timetest is http://nishi.dreamhosters.com/u/timetest.exe http://nishi.dreamhosters.com/u/timetest.cpp )

    > For such speeds, I agree that "in memory" tests would be more
    > interesting, such as the excellent work from inikep. This is
    > however, quite more complex to setup,

    Inikep probably can share his scripts - I doubt there's anything "exclusive" there.

    > It would also create 2 different sets of data,
    > therefore no "unified" presentation methodology.

    Yes, but there's no such thing anyway.
    I've been telling it many times to Sami and other people, but I think what we
    need is not "global" benchmark (like most of them are), but detailed reviews
    of specific usage cases. Like "I need to backup VM images, which compressor do I use"
    or "How to make game rips". Afaik, hardware sites do that all the time (chip reviews etc),
    but somehow we only have random virtual timings for compressors.

  9. #9
    Programmer
    Join Date
    May 2008
    Location
    PL
    Posts
    309
    Thanks
    68
    Thanked 173 Times in 64 Posts
    Quote Originally Posted by Shelwien View Post
    > For such speeds, I agree that "in memory" tests would be more
    > interesting, such as the excellent work from inikep. This is
    > however, quite more complex to setup,

    Inikep probably can share his scripts - I doubt there's anything "exclusive" there.
    There are no scripts I've joined all compressors into a single exe. All of them have in-memory compress() and decompress() functions. At the beginning an input file is read to memory. Then all compressors are used to compress the file. That's all. This approach has a big advantage of using the same compiler with the same optimizations. The idea is based on http://www.quicklz.com/bench.html

  10. #10
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,372
    Thanks
    213
    Thanked 1,020 Times in 541 Posts
    I meant all sources necessary to print that - http://encode.su/threads/1253-LZO-Pr...ll=1#post24598
    If you can upload it, Cyan probably would be able to add his codecs there.

  11. #11
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    410
    Thanks
    37
    Thanked 60 Times in 37 Posts
    @inikep: "I've joined all compressors into a single exe. ...
    This approach has a big advantage of using the same compiler with the same optimizations."

    Can you please share such a super-compressor-binary ?

    best regards

  12. #12
    Programmer
    Join Date
    May 2008
    Location
    PL
    Posts
    309
    Thanks
    68
    Thanked 173 Times in 64 Posts
    Quote Originally Posted by Shelwien View Post
    I meant all sources necessary to print that - http://encode.su/threads/1253-LZO-Pr...ll=1#post24598
    This output is generated by the program I described

    Quote Originally Posted by joerg View Post
    Can you please share such a super-compressor-binary ?
    best regards
    It's here (binary and sources):
    http://encode.su/threads/1266-In-mem...y)-compressors

  13. #13
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,372
    Thanks
    213
    Thanked 1,020 Times in 541 Posts
    Yes, thanks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •