Results 1 to 13 of 13

Thread: Another benchmark

  1. #1
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts

    Another benchmark

    I just put together another benchmark and thought it might be of interest. I have run it on two computers so far, the results are here: http://quixdb.github.io/squash/#benchmarks. Hopefully more are coming soon; I have a BeagleBoard-xM, PandaBoard ES, and Raspberry Pi, which should be interesting. The benchmark is fairly limited in scope. It uses Squash (which is basically just an abstraction library I've been working on), so only software which has a Squash plugin is supported. Hopefully that list will grow over time, but right now it is: bzip2, FastLZ, LZ4, liblzma, LZF, LZO, QuickLZ, Snappy, and zlib. The other thing which limits its scope is that it currently only uses the default options for each codec. I plan to change this when I have time, and provide an interface for exploring the effects of the different options. I'm also still learning the Google Visualization API, so hopefully the charts will improve over time.

  2. Thanks:

    Black_Fox (2nd August 2013)

  3. #2
    Member just a worm's Avatar
    Join Date
    Aug 2013
    Location
    planet "earth"
    Posts
    96
    Thanks
    29
    Thanked 6 Times in 5 Posts
    Is that (see attachment) how the results list is ought to look like? To me it looks like something is missing. Maybe you would like to use just a standard html table or a comma-separated-text-file. That is way more compatible than the stuff which Google comes up with.
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	results.png 
Views:	189 
Size:	12.2 KB 
ID:	2413  
    Last edited by just a worm; 2nd August 2013 at 09:53.

  4. #3
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    Quote Originally Posted by just a worm View Post
    Is that (see attachment) how the results list is ought to look like? To me it looks like something is missing.
    No, it's supposed to have a table and a bunch (currently 5) of charts per benchmark. Perhaps you have JavaScript turned off? I've just added a <noscript> to warn people who have disabled JS.

    Here is what it is supposed to look like (tested with Chromium 27, Firefox 22, and Firefox Mobile 22):

    Click image for larger version. 

Name:	Squash Benchmarks.png 
Views:	187 
Size:	281.1 KB 
ID:	2414

    Maybe you would like to use just a standard html table or a comma-separated-text-file. That is way more compatible than the stuff which Google comes up with.
    There are *plenty* of simple table / CSV benchmarks out there. http://www.mattmahoney.net/dc/text.html is particularly impressive. One more wouldn't really add anything.
    Last edited by nemequ; 2nd August 2013 at 11:23. Reason: Add screenshot

  5. #4
    Member just a worm's Avatar
    Join Date
    Aug 2013
    Location
    planet "earth"
    Posts
    96
    Thanks
    29
    Thanked 6 Times in 5 Posts
    Quote Originally Posted by nemequ View Post
    No, it's supposed to have a table and a bunch (currently 5) of charts per benchmark.
    Thanks for the screenshot.

    Quote Originally Posted by nemequ
    Perhaps you have JavaScript turned off? I've just added a to warn people who have disabled JS.
    Actually I have javascript activated. So I can't see the noscript message. But from the screenshot it looks like it is no pure javascript.

    There are *plenty* of simple table / CSV benchmarks out there. http://www.mattmahoney.net/dc/text.html is particularly impressive. One more wouldn't really add anything.
    You are probably right about that Even though, presenting the data in a compatible way is beneficial in several ways:

    - people can print it easier
    - people can copy the data in their prefered application to change the display of the plot, to use filters on the data or to do some special calculations
    - all people benefit from your contribution and not only thouse with compatible browsers

    If you are still convinced by the data display solution that Google offers then you might want to add the raw data so both forms are available.

    Edit: Back to the real topic: It's interesting to see the huge differences regarding the speed.
    Last edited by just a worm; 2nd August 2013 at 11:52.

  6. #5
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    Quote Originally Posted by just a worm View Post
    Thanks for the screenshot.

    Actually I have javascript activated. So I can't see the noscript message. But from the screenshot it looks like it is no pure javascript.
    Interesting. Perhaps you're using an old version of Firefox? I'm probably using some random feature which requires a relatively new browser. If you see anything in the error console I can at least take a look.

    Quote Originally Posted by just a worm View Post
    You are probably right about that Even though, presenting the data in a compatible way is beneficial in several ways:

    - people can print it easier
    - people can copy the data in their prefered application to change the display of the plot, to use filters on the data or to do some special calculations
    - all people benefit from your contribution and not only thouse with compatible browsers

    If you are still convinced by the data display solution that Google offers then you might want to add the raw data so both forms are available.
    The raw data is embedded in the HTML in a very simple JSON format. You may not be able to import it into Excel but for most programmers it's a lot easier to work with, and I'd be surprised if there were many non-programmers around here.

    If the problem was the Google Visualizations API you should see boxes with error messages, so I think the problem is somewhere in my code. If you tell me what version of FF you're using I can try to take a look at it... no promises about getting it to work, but at the very least it should display an error message to explain why it doesn't work.

  7. #6
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 797 Times in 489 Posts
    I can see the results just fine. http://quixdb.github.io/squash/bench...525.html#iliad

    I would suggest sorting the tables and graphs from best to worst (whether that means size or speed). The scatter plots are nice, once I figured out that the name of the compressor appears when you hover the mouse over the dot. They might be easier to view if speed used a log scale instead of a linear scale.

    Also, it would help to describe the test conditions. What CPU? What is "iliad"? (I recognize enwik. It would help to provide links to the data sets.

    From the other documentation I guess that these are memory to memory tests using Squash APIs around the corresponding libraries. I wonder if developers will bother to write them just to get their compressors listed. I could probably write one for libzpaq but it's not high on my list of priorities. I can just run zpaq on enwik8 to get the same results.

  8. #7
    Tester
    Black_Fox's Avatar
    Join Date
    May 2008
    Location
    [CZE] Czechia
    Posts
    471
    Thanks
    26
    Thanked 9 Times in 8 Posts
    I thought that my Chrome also is not able to open the benchmarks, but then I found out I actually have to click on either one of iliad/enwik8, otherwise nothing will show up.
    I am... Black_Fox... my discontinued benchmark
    "No one involved in computers would ever say that a certain amount of memory is enough for all time? I keep bumping into that silly quotation attributed to me that says 640K of memory is enough. There's never a citation; the quotation just floats like a rumor, repeated again and again." -- Bill Gates

  9. #8
    Member just a worm's Avatar
    Join Date
    Aug 2013
    Location
    planet "earth"
    Posts
    96
    Thanks
    29
    Thanked 6 Times in 5 Posts
    Quote Originally Posted by nemequ View Post
    ... so I think the problem is somewhere in my code. If you tell me what version of FF you're using I can try to take a look at it...
    Ok, I will send you the details via private message to keep the discussion out of this thread.

  10. #9
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    Thanks for all the feedback!

    Quote Originally Posted by Matt Mahoney View Post
    I would suggest sorting the tables and graphs from best to worst (whether that means size or speed).
    You can sort the table however you want by clicking one of the column headers. As for the charts, I'm not actually sure how to sort that independently of the model, and all the tables and charts are just different views of the same model. I'll look into how to do that, but for now I can only do one or the other for both bar charts, so I think I'll just keep it in alphabetical order since it makes it easier to correlate data with other charts.

    Quote Originally Posted by Matt Mahoney View Post
    The scatter plots are nice, once I figured out that the name of the compressor appears when you hover the mouse over the dot.
    The other difficult to discover feature is that you can click an item in the table to select it, and it will be selected in all of the other charts. In theory you should be able to click on a data point in any chart and have it selected in all the others, but right now the table and "Speed" chart don't get selected properly... still nice for switching back and forth between the scatter plots, though.

    Quote Originally Posted by Matt Mahoney View Post
    They might be easier to view if speed used a log scale instead of a linear scale.
    I actually tried that and didn't really like it. It makes the slower compressors easier to read at the expense of the faster ones, and linear scales are more intuitive so that's what I used. Perhaps I should just make it so the user can toggle between the two. In the meantime, I've changed the Core i5-2400 benchmark to be logarithmic. If people prefer that I can easily switch.

    Quote Originally Posted by Matt Mahoney View Post
    Also, it would help to describe the test conditions. What CPU?
    The CPU is described in the table at http://quixdb.github.io/squash/#benchmarks. I just added platform, which isn't too useful right now but should be more illuminating when I get around to benchmarking some embedded devices. Adding it to the individual benchmarks is a bit complicated since I'm not aware of a cross-platform way to dump CPU info (reading from /proc/cpuinfo obviously doesn't work on Windows, but I don't even know if it will work on OS X, BSD, Solaris, etc.)

    Quote Originally Posted by Matt Mahoney View Post
    What is "iliad"? (I recognize enwik. It would help to provide links to the data sets.
    Ah, good point. I've added some information to http://quixdb.github.io/squash/#benchmarks describing the datasets. Unfortunately adding it to the individual benchmarks is, once again, a bit tricky. This is largely about a making tool to make it easy for people to benchmark whatever dataset they want so that they can use have better data about the trade-offs of each codec on their hardware using their dataset, so all I really know about a dataset at that stage is a file name and size. The fact that it also makes it easy for me to generate a few generic datasets is mostly a happy by-product.

    Quote Originally Posted by Matt Mahoney View Post
    From the other documentation I guess that these are memory to memory tests using Squash APIs around the corresponding libraries.
    Actually, it compresses and decompresses from file to file (using a tmpfile()). For compressors which only provide a buffer-to-buffer API (as opposed to a zlib-style stream-based API) it will attempt to mmap the input and output files before falling back on some spectacularly inefficient code (which doesn't happen on the ones I posted).

    However, measurements are using CPU time not wall clock time, so that shouldn't really matter. FWIW I do actually record wall clock time already, switching to it is just a matter of changing a couple of characters in the JavaScript (lines 67 and 68, just replace "cpu" with "wall"). The only two files you need are the HTML and benchmark.js... some other stuff (Google Visualizations API and JQuery) are pulled in from the internet, but it runs just fine from a file:// URL. If people prefer I can switch to wall clock time, I just thought CPU time would be more appropriate considering the purpose of this benchmark.

    Quote Originally Posted by Matt Mahoney View Post
    I wonder if developers will bother to write them just to get their compressors listed. I could probably write one for libzpaq but it's not high on my list of priorities. I can just run zpaq on enwik8 to get the same results.
    That would be nice. Hopefully having a Squash plugin will provide some other perks in the near future, as well?the biggest one being free bindings for any language Squash supports (currently just C and Vala, hence the "near future" bit). Also, if you provide it with a simple low level API, it can provide you with convenience APIs (it already has file to file, buffer to buffer, and stream based, I'm working on buffer to file and file to buffer). In short, it tries to do all the stuff that isn't actual compression/decompression for you.

    That said, Squash only supports single-file style compressors (libarchive already exists for archivers), so a plugin for libzpaq might be fairly awkward. lpaq would probably work but I haven't bothered since, according to your page, you no longer maintain it. If there is a reasonable way to use libzpaq like a single-file compressor I'd be happy to write the code. Basically, plugins implement callbacks from the SquashCodecFuncs struct. Fallbacks are provided wherever possible, but you need to implement at least either a buffer-to-buffer API (compress_buffer and decompress_buffer) or a zlib-style stream API (create_stream and process_stream).

    Quote Originally Posted by Black_Fox View Post
    I thought that my Chrome also is not able to open the benchmarks, but then I found out I actually have to click on either one of iliad/enwik8, otherwise nothing will show up.
    Good point, that's really not clear. I added some text to instruct people to choose a dataset.

  11. #10
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 797 Times in 489 Posts
    Actually libzpaq has an API in C++ supporting generic stream compression. You extend two abstract base classes Reader and Writer with get() and put() methods for byte I/O (and optionally read() and write() for faster block I/O), and also an error handling callback function. The compressor supports 3 basic compression levels, but not any of the really fast methods used in the zpaq archiver. All the documentation is in libzpaq.h. You include libzpaq.h and link it to libzpaq.cpp. Here is a simple program that compresses from stdin to stdout at level 2 (of 3).

    Code:
    #include "libzpaq.h"
    #include <stdio.h>
    #include <stdlib.h>
    
    void libzpaq::error(const char* msg) {  // print message and exit
      fprintf(stderr, "Oops: %s\n", msg);
      exit(1);
    }
    
    class In: public libzpaq::Reader {
    public:
      int get() {return getchar();}  // returns byte 0..255 or -1 at EOF
    } in;
    
    class Out: public libzpaq::Writer {
    public:
      void put(int c) {putchar(c);}  // writes 1 byte 0..255
    } out;
    
    int main() {
      libzpaq::compress(&in, &out, 2);  // level may be 1, 2, or 3
    }
    The decompresser would use libzpaq::decompress(&in, &out);

    The documentation describes more advanced uses such as custom compression algorithms written in ZPAQL, attaching external preprocessors, and archive handling functions such as grouping and naming files and computing and verifying SHA-1 checksums. On x86 and x86-64 machines in Windows and Linux, the ZPAQL is translated to machine code and executed when compression starts as an optimization. Compile with -DNOJIT on other platforms.

    You can also use the zpaqd program as a single file compressor at levels 1, 2, or 3:

    zpaqd c 2 archive input
    zpaqd d archive output

    If you want to test zpaq this way:

    zpaq a archive.zpaq input -method 1 -until 0
    zpaq x archive.zpaq input -to output -force

    Method can be 1 (fastest) to 6 (best). -until 0 will overwrite archive.zpaq. -force will overwrite output.

  12. #11
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    Quote Originally Posted by Matt Mahoney View Post
    Actually libzpaq has an API in C++ supporting generic stream compression. You extend two abstract base classes Reader and Writer with get() and put() methods for byte I/O (and optionally read() and write() for faster block I/O), and also an error handling callback function. The compressor supports 3 basic compression levels, but not any of the really fast methods used in the zpaq archiver. All the documentation is in libzpaq.h. You include libzpaq.h and link it to libzpaq.cpp. Here is a simple program that compresses from stdin to stdout at level 2 (of 3).
    Excellent, thanks for the pointers. I just put together a plugin, but before I commit it... Is there a repository other than https://github.com/zpaq/zpaq (preferably git, but I could make other stuff work)? I'd like to use a git submodule to include zpaq, but that repository hasn't been updated in a while (it is stuck at 6.19).

  13. #12
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 797 Times in 489 Posts
    I don't maintain the github fork. The latest version is http://mattmahoney.net/dc/zpaq.html
    The zpaq download has the latest version of libzpaq included.

  14. #13
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    Quote Originally Posted by Matt Mahoney View Post
    I don't maintain the github fork. The latest version is http://mattmahoney.net/dc/zpaq.html
    The zpaq download has the latest version of libzpaq included.
    Okay, ZPAQ is now included. I just copied the relevant files into my repo for now. If a good repository appears in the future I'll switch to a submodule. Thanks again for the info about the API.

    On another note, I also added SHARC, which turned out to be quite interesting. It's very fast on the i5-2400, but performance is terrible on the Atom D525. Definitely looking forward to seeing the results on ARM?I'll try to take care of that next week.

    If anyone else has any suggestions for other algorithms they would like to see included please let me know.

  15. Thanks:

    Matt Mahoney (13th August 2013)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •