Page 1 of 9 123 ... LastLast
Results 1 to 30 of 249

Thread: Filesystem benchmark

  1. #1
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts

    Filesystem benchmark

    I posted the benchmark in another thread before, but I think it deserves its own thread.
    It's a tool that can test performace of codecs when compressing small pieces of data. While it was designed for filesystems, it can also be used for stuff like network packets.

    The new version is tuned for portability. I wanted it to work on almost anything with a C++03 compiler, though I didn't test it on anything other than x86 Linux, Windows, gcc, clang, so don't expect wonders. I take bug reports though. On the way to achieve it, I removed the codecs that were very weak, though reintroduction is easy. Supported codecs:
    LZ4 r33
    LZ4hc r7
    snappy 1.0.4 r49
    lzjb 2010
    lzo 2.05 1x_1
    lzo 2.05 1x_999
    quicklz 1.5.1b6 -1
    quicklz 1.5.1b6 -3
    zlib 1.2.5 -1
    zlib 1.2.5 -6
    zlib 1.2.5 -9

    I also added multithreading support.

    It's output is not human readable, but a csv for easier analysis.

    Code:
    fsbench 0.5 (c) Dell Inc.  Written by P.Skibinski; FS mod by m^2
    usage: fsbench [options] input
     -iX: number of iterations (default = 1)
     -a: test all compressors, also slow ones
     -bX: filesystem block size(default = 2147483647)
     -sX: disk sector size(default = 1)e
     -v: verify that decompression went fine
    Attached a MinGW 4.5.2 x86 build and sources.


    I am going to do comprehensive testing soon. I would be thankful for computing time contributions...please contact me if you want to help.

    UPDATE:
    From now on, you can get the latest version of fsbench here:
    https://chiselapp.com/user/Justin_be...sitory/fsbench

    To check it out:
    Code:
    fossil clone https://chiselapp.com/user/Justin_be_my_guide/repository/fsbench fsbench
    fossil open fsbench
    Attached Files Attached Files
    Last edited by m^2; 14th March 2014 at 21:05.

  2. #2
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    The 1st fix already.
    Cyan contacted me to say that I use LZ4 wrong. Also, I found that on some CPUs LZ4hc is very slow, so it really deserves being used with -a only.
    Attached Files Attached Files

  3. #3
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    0.52 out. There was a problem that OS needs time to warm up which caused skewed results unless used carefully. I added some warmup code. Thanks Cyan for pushing it.
    Also, I updated LZ4hc to just released r8. And did several minor improvements.
    Attached Files Attached Files

  4. #4
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    0.53 is out. Cyan notified me that the warm up was insufficient, it's fixed. Also, this time I included x64 Windows binaries. And again, there are numerous minor improvements.

    Also, I have a question. The program is WAY slower when compiled with GCC's link time optimization. Example session:
    Code:
    m@m-Nokia:~/Downloads/e/src$ ./fsbench ../../scc.tar 
    fsbench 0.53 (c) Dell Inc.  Written by P.Skibinski; FS mod by m^2
    memcpy            = 382 ms (541 MB/s), 211927552->211927552
    LZ4 r35,4605,1564,101594941
    snappy 1.0.4 r49,6343,2207,104714741
    lzjb 2010,7579,3398,120620210
    lzo 2.05 1x_1,2025,512,208591210
    quicklz 1.5.1b6 -1,5753,3570,94684783
    quicklz 1.5.1b6 -3,52256,1835,81784109
    zlib 1.2.5 -1,33994,6646,77214310
    zlib 1.2.5 -6,83665,6114,68179175
    all               = 222448 ms (0 MB/s), 0->0
    done... (1 iterations)
    m@m-Nokia:~/Downloads/e/src$ ./fsbench_nolto ../../scc.tar 
    fsbench 0.53 (c) Dell Inc.  Written by P.Skibinski; FS mod by m^2
    memcpy            = 363 ms (570 MB/s), 211927552->211927552
    LZ4 r35,2221,705,101594941
    snappy 1.0.4 r49,3096,1064,104714741
    lzjb 2010,3581,1713,120620210
    lzo 2.05 1x_1,503,415,208591210
    quicklz 1.5.1b6 -1,2448,1819,94684783
    quicklz 1.5.1b6 -3,32999,1021,81784109
    zlib 1.2.5 -1,16769,2960,77214310
    zlib 1.2.5 -6,47481,2721,68179175
    all               = 121908 ms (0 MB/s), 0->0
    done... (1 iterations)
    I checked it on gcc 4.5.2 x86 on Linux and MinGW 4.6.2 AMD64 on Windows.
    Does anybody know what's up?
    Attached Files Attached Files
    Last edited by m^2; 3rd October 2011 at 13:46.

  5. #5
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Update.

    This is mostly a portability improvement.
    I changed the build system to CMake and on Windows removed pthread dependency. Up to now I managed to compile it on Windows with VC9, mingw 4.5.2, mingw64 4.6.2, Clang 2.9.0 and llvm-gcc 2.9.0-4.2.1. It failed to compile with VC6 because quicklz includes some unavailable header, I didn't dig further, I think this compiler is not very interesting anyway. On Linux gcc 4.5.2 (32 and 64 bit targets) and Clang 2.8.1.
    This time only I included all the Windows binaries. One remark: due to CMake bug, Clang binary is not compiled with full optimizations, just -O3, which skips link time code generation. I didn't fight it, just reported and wait. Though I wonder if it has adverse effect like it does on gcc.

    Also, LZ4 is updated to r36. And I made numerous tiny changes all around.
    Attached Files Attached Files

  6. #6
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Some incomplete, rough results (ctime/dtime):
    No code has to be inserted here.
    Last edited by m^2; 11th October 2011 at 19:51.

  7. #7
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    please include the compiler versions in the headers of table. vc2010 and icl11 versus gcc3/4 would be most interesing for me

  8. #8
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    They are the same as mentioned before, but I added them to the table.
    You can make your own tests, hopefully the benchmark will work with icc / vc2010.

    Minor update, I had a stupid bug, in printf I used %ull instead of %llu.
    BTW in recent days I found 3 bugs in different compilers and 1 in CMake.
    The latest one is (known already) that mingw doesn't work with %llu.
    Attached Files Attached Files
    Last edited by m^2; 11th October 2011 at 20:01.

  9. #9
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    863
    Thanks
    461
    Thanked 257 Times in 105 Posts
    As far as i know, %llu seems to work under MinGW, but MinGW generates a lot of useless warnings during compilation.
    A quick search over Internet shows this is indeed a long-time well known defect of MinGW.

    Btw, nice work with your benchmark suite !

  10. #10
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by Cyan View Post
    As far as i know, %llu seems to work under MinGW, but MinGW generates a lot of useless warnings during compilation.
    A quick search over Internet shows this is indeed a long-time well known defect of MinGW.

    Here both mingw and llvm-gcc had problems with (from my head):
    Code:
    #include <stdio.h>
    int main()
    {
        unsigned long long x=1, y=1;
        printf("%llu %llu",x,y);
        return 0;
    }
    The result was "1 0". Precisely, it treated %llu as 32 bit value and treated a single 64 bit value as 2 32 bit ones. I saw others mentioning the bug, so I stopped digging further.
    Quote Originally Posted by Cyan View Post
    Btw, nice work with your benchmark suite !
    Thank you.

  11. #11
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    To work around the g++ bug, instead of %llu I use printf("%1.0f", (double) x); (not exact, I know, but lots of times I don't need all 64 bits). cout << x works.

  12. #12
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    me too, only "%.0lf" (f stands for float, lf for double. although both passed as double on x86)

  13. #13
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Yesterday I went on to fix warnings that stem from different int sizes (i.e. my Linux wanted uint64_t to be printed as %lu).
    The solution was, I think, the cleanest possible and looked like:
    Code:
    #if ULLONG_MAX == UINT64_MAX
    # define UINT64_FORMAT "%llu"
    #else
    # if ULONG_MAX == UINT64_MAX
    #  define UINT64_FORMAT "%lu"
    # else
    #  error "Can't figure out what format to use"
    # endif
    #endif
    The same for uint32_t and size_t.
    Then I decided to screw it and use iostream.


    ADDED: I used to use printf even in C++ programs, because I find its formats much more readable, but I didn't realize how messy it becomes when you want the stuff to be portable....
    Last edited by m^2; 12th October 2011 at 11:11.

  14. #14
    Member
    Join Date
    Feb 2010
    Location
    Nordic
    Posts
    200
    Thanks
    41
    Thanked 36 Times in 12 Posts
    Been bitten too :( %zu is C99 but NOT C++ so if you turn on full warnings and strict ansi etc on g++ it will refuse to compile a string with %zu in. But if the program is C99 instead, its fine... Same for the PRIxPTR etc macros in The only working thing is to go with std::ostream

  15. #15
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Today another solution came to my mind. I'm staying with cout here, but this might be useful one day. Simply, convert all ints that don't have their own format strings to (unsigned) long longs and print them as such.
    Code:
    #include <cstdio>
    #include <climits>
    
    #define __STDC_LIMIT_MACROS
    
    #if __cplusplus >= 201103L // C++ 2011
        #include <cstdint>
    #else
    extern "C"
    {
        #include <stdint.h>
    };
    #endif // C++ 2011
    
    #if UINTMAX_MAX > ULLONG_MAX || INTMAX_MAX > LLONG_MAX || INTMAX_MIN > LLONG_MIN
        #error "Printfs may be broken"
    #endif
    
    template<typename T> long long          I(T param) {return static_cast<long long> (param);}
    template<typename T> unsigned long long U(T param) {return static_cast<unsigned long long> (param);}
        
    
    int main()
    {
        int64_t  i = 1;
        uint64_t u = 1;
        printf("%ll %llu", I(i), U(u));
        return 0;
    }
    Well, it is broken on MinGW, but I see no good workaround for the bug.


    It would be nice to do it with a single template or macro, but I don't know if it's possible w/out explicit specializations.
    Last edited by m^2; 13th October 2011 at 20:23.

  16. #16
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    863
    Thanks
    461
    Thanked 257 Times in 105 Posts
    As a quick comment to this %llu issue :

    I was surprised by the MinGW pb, this it was working fine for my own binaries (in spite of the compiler warnings).
    For example, i had no problem getting status message like :
    Regenerated size : 6537216000 Bytes

    Well, i just experienced the same MinGW incompatibility ... on windows XP.

    It seems that MinGW, when compiling under Windows Seven, does work fine with %llu.
    But for some reason, not on XP...
    Last edited by Cyan; 21st October 2011 at 23:06.

  17. #17
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Why don't you define a macro which uses native Windows API for such usages? I think, it's more appropriate.
    BIT Archiver homepage: www.osmanturan.com

  18. #18
    Member przemoc's Avatar
    Join Date
    Aug 2011
    Location
    Poland
    Posts
    44
    Thanks
    3
    Thanked 23 Times in 13 Posts
    There is no reason to define own macros if such ones are already there. Check inttypes.h. In this case it would be PRId64 and PRIu64:

    Code:
    printf("%"PRId64" %"PRIu64, I(i), U(u));

  19. #19
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by osmanturan View Post
    Why don't you define a macro which uses native Windows API for such usages? I think, it's more appropriate.
    Good idea. Though I'll keep using ostream for this benchmark.
    Quote Originally Posted by przemoc View Post
    There is no reason to define own macros if such ones are already there. Check inttypes.h. In this case it would be PRId64 and PRIu64:

    Code:
    printf("%"PRId64" %"PRIu64, I(i), U(u));
    AFAIK it's not standard.

  20. #20
    Member przemoc's Avatar
    Join Date
    Aug 2011
    Location
    Poland
    Posts
    44
    Thanks
    3
    Thanked 23 Times in 13 Posts
    Quote Originally Posted by m^2 View Post
    AFAIK it's not standard.
    Quote Originally Posted by Wikipedia
    The headers <complex.h>, <fenv.h>, <inttypes.h>, <stdbool.h>, <stdint.h>, and <tgmath.h> were added with C99, a revision to the C Standard published in 1999.
    They are in *nix (libc6-dev in Linux) and MinGW(-W64). Don't know about VS, but it should be there too.

  21. #21
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by przemoc View Post
    They are in *nix (libc6-dev in Linux) and MinGW(-W64). Don't know about VS, but it should be there too.
    I verified it and it's indeed a mandatory part of C99 standard, thank you for suggestion.
    I didn't verify that it's in C++ because the standard is crazy expensive, but the it is said to have cinttypes.

  22. #22
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    A new version is out. There are numerous bug fixes and pretty formatting contributed by Cyan (Thank you). The pretty formatting is default with the old one still available with -c switch.
    Attached Files Attached Files

  23. #23
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    While adding Cyan's pretty output, I ported it from printf to cout to stay consistent with the rest of the program. I screwed it up. And fixed now. Also, I updated LZ4 to the latest version (r41).
    Attached Files Attached Files

  24. #24
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    I updated LZ4 to r51 and LZ4HC to r10. Both are supposed to be faster, but I don't have any numbers ATM.
    Attached Files Attached Files

  25. #25
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Made a new version.

    Changes:
    -Brought back LZMAT
    -Brought back LZF
    -Brought back NRV*
    -Updated Snappy to 1.0.5
    -Updated LZ4 to r57
    -Improved handling of codecs with support for limited output size. Up to now I gave them a lot of space to avoid errors, though I expected them to save at least 1 sector. Now I have error handling (turned out to be trivial), so zlib and lzjb are expected to have better performance on at least partially incompressible data. The same with some codecs that are back (LZMAT, LZF) and with LZ4 (thanks to the update).

    I tried to add the compression from NTFS-3G, but failed. I get segfaults, don't know the reason, debugging costs more than I am willing to spend ATM.
    Also, I looked at adding snappy-c, but it relies heavily on linux+gcc and there seems to be no easy way to make it performance-portable because it uses intensively some asm headers.
    Disappointing.
    Attached Files Attached Files

  26. #26
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Another update.
    lzmat was broken, I fixed it.
    There was a bug with incorrect size calculation when the last block was incompressible, fixed that too.
    There's a major internal refactoring, maintainability of the code should improve.
    And a new feature: you can specify the exact codecs that you want to test.
    Attached Files Attached Files

  27. #27
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    I got a report that the project doesn't compile on Clang / Linux. Hopefully fixed it.
    I made a couple of other minor fixes and enhancements too.

    Enough for this weekend.
    Attached Files Attached Files

  28. #28
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    A major update.
    - Now you can supply codecs with parameters, i.e. try all 9 modes of zlib. The interface is flexible, I (or anybody else) can add support for advanced parameters any time if that's needed.
    - I changed the internal benching logic, so it should be easier to use correctly
    - Updated LZ4 to r58
    - Support for all modes of LZO prepared by inikeep, not just the few available before
    - Support for stronger LZ4 modes (COMPRESSIONLEVEL=12 or 17)
    - changes in command line.
    Attached Files Attached Files

  29. #29
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Update.
    [+] added BriefLZ
    [+] added fastlz
    [+] added a couple of lzo modes
    [+] added tornado
    [+] added Yappy
    [+] automated architecture detection in cmake (just 32/64 bitness. I don't have anything but x86 / AMD64 to test anyway)
    [+] added changelog
    [-] removed unused LZHAM sources. I intend to readd LZHAM alpha 8, but currently there was no sense to keep it.
    [-] removed LZRW*. These have only historical value.
    [~] improved docs
    [~] updated LZ4 to r59
    [!] modified LZMAT's integer types to fix compilation errors on mingw64
    [!] ported BriefLZ to AMD64
    [!] fixed a bug: when compiling with gcc with optimized flags, c++ code would get just plain -O3 and no other fireworks
    [!] fixed nrv* compilation with MSVC9
    [!] fixed displaying of incorrect memcpy speed

    One of the newly added LZO modes, 1x1_15 is very interesting. It's another bit faster than 1x_1, really nice.
    Also, Yappy compiled with mingw64 is the fastest decompressor that I've seen on my CPU. Though with other compilers it looses to LZ4. An interesting thing about it is that it uses SSE2 copying to speed things up. But I added an option in CMakeLists.txt to disable SSE2; in such mode I made Yappy use 2 64-bit copy operations instead if 1 128-bit. On my CPU both versions are equally fast. Could anybody tell how does it work on their machines?
    Attached Files Attached Files

  30. #30
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Update.
    [+] added LZHAM
    [+] added even more LZO modes
    [+] added RLE64
    [+] added LZV1
    [+] added LZP1
    [+] added minizip
    [~] updated LZ4hc to r12
    [~] updated LZ4 to r59

    Comments:
    This version has limited portability because of prerelease version of LZHAM. Among others, it doesn't compile with 32-bit mingw. This is to be fixed.

    RLE64 is (surprise, surprise) the fastest codec included. And the weakest as well.
    LZHAM is back, this time a new version. Still uselessly slow on small blocks, though may be better on larger ones.
    The new LZ4hc is much better than before
    And finally miniz...it's an alternative Deflate implementation. I didn't do many tests, but:
    - decompressor is slower than that of zlib
    - compressor in mode 1 is much faster than zlib, but much weaker too. It's quite like quicklz, but 3x slower.
    - compressor in modes 2-3 is very nice, 2 is as fast as zlib 1, but a couple % stronger; 3 is much stronger than 2 and still plenty fast.
    - Higher modes aren't competitive with zlib.
    Overall I like miniz. Especially when used with zlib's decompressor.
    Attached Files Attached Files

Page 1 of 9 123 ... LastLast

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •