Results 1 to 14 of 14

Thread: Progamming language benchmarks with compression workloads

  1. #1
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    353
    Thanks
    131
    Thanked 54 Times in 38 Posts

    Progamming language benchmarks with compression workloads

    Hi all – I want to deploy a programming language comparison benchmark site, something much better and more valid than the Benchmarks Game site. It's mostly going to be small programs of course, microbenchmarks. I wonder if there are compression codecs that can be compared across programming languages. Do you know of any codecs that are implemented in more than one PL?

    The only one I know is deflate/gzip. Most implementations are in C, like zlib and libdeflate, but there is at least one Go implementation: Klaus Post's compress library. (I think many of his optimizations ended up in the Go standard library.)

    Is Pavlov's gzip written in C++ or C? I mean the one included in the 7-Zip utility.

    What other codecs have multiple implementations in different programming languages? Is it only gzip?

    Thanks.

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,962
    Thanks
    294
    Thanked 1,293 Times in 734 Posts
    > Do you know of any codecs that are implemented in more than one PL?

    Only C/C++ have good compilers, so there's no point in that imho.
    But there're some java and C# rangecoder implementations which could be directly compared.

    Also its possible to compile C/C++ for wasm target (which can be JS), but that would be kinda a benchmark of LLVM only.

    > Is Pavlov's gzip written in C++ or C? I mean the one included in the 7-Zip utility.

    C++. I have a standalone version if necessary: http://nishi.dreamhosters.com/u/7zdeflate_v1.rar

  3. #3
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,495
    Thanks
    26
    Thanked 131 Times in 101 Posts
    I've done one: https://github.com/tarsa/TarsaLZP

    Also IIUC then https://github.com/flanglet/kanzi https://github.com/flanglet/kanzi-cpp https://github.com/flanglet/kanzi-go are the same algorithms implemented in different languages.

  4. #4
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    353
    Thanks
    131
    Thanked 54 Times in 38 Posts
    Quote Originally Posted by Piotr Tarsa View Post
    I've done one: https://github.com/tarsa/TarsaLZP

    Also IIUC then https://github.com/flanglet/kanzi https://github.com/flanglet/kanzi-cpp https://github.com/flanglet/kanzi-go are the same algorithms implemented in different languages.
    What's the deal with Kanzi? Is it a new codec?

  5. #5
    Member
    Join Date
    Nov 2014
    Location
    California
    Posts
    158
    Thanks
    51
    Thanked 43 Times in 32 Posts
    https://encode.su/threads/2083-Kanzi-Java-Go-and-C-compressors

    ​You should be able to find implementations of LZ4 or snappy in different languages.

  6. #6
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    353
    Thanks
    131
    Thanked 54 Times in 38 Posts
    Oh I forgot about it, forgot that I had posted in the thread.

  7. #7
    Member
    Join Date
    Dec 2012
    Location
    japan
    Posts
    164
    Thanks
    31
    Thanked 64 Times in 40 Posts
    it's implementation of lpaq1 written in Javascript. but format is not the same as original.
    Attached Files Attached Files

  8. #8
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    171
    Thanks
    20
    Thanked 61 Times in 30 Posts
    Quote Originally Posted by Shelwien View Post
    > Do you know of any codecs that are implemented in more than one PL?

    Only C/C++ have good compilers, so there's no point in that imho.
    But there're some java and C# rangecoder implementations which could be directly compared.

    Also its possible to compile C/C++ for wasm target (which can be JS), but that would be kinda a benchmark of LLVM only.

    > Is Pavlov's gzip written in C++ or C? I mean the one included in the 7-Zip utility.

    C++. I have a standalone version if necessary: http://nishi.dreamhosters.com/u/7zdeflate_v1.rar
    we also have Rust now. there are some reimplementations for commonly used algorithms:
    https://github.com/BurntSushi/rust-snappy
    https://github.com/main--/rust-lz-fear

  9. #9
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,572
    Thanks
    779
    Thanked 687 Times in 372 Posts
    https://github.com/klauspost/compress has several compressors reimplemented in Go

    but be careful - complex algos such as deflate and zstd, doesn't implement the same algo at the same compression level, you can find result comparison here: https://docs.google.com/spreadsheets/d/1nuNE2nPfuINCZJRMt6wFWhKpToF95I47XjSsc-1rbPQ/edit#gid=1088551794

    (gzstd, gzkp are his Go implemenetations)

  10. #10
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    353
    Thanks
    131
    Thanked 54 Times in 38 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    https://github.com/klauspost/compress has several compressors reimplemented in Go

    but be careful - complex algos such as deflate and zstd, doesn't implement the same algo at the same compression level, you can find result comparison here: https://docs.google.com/spreadsheets/d/1nuNE2nPfuINCZJRMt6wFWhKpToF95I47XjSsc-1rbPQ/edit#gid=1088551794

    (gzstd, gzkp are his Go implemenetations)
    Right, I mentioned compress in my initial post above.

    He's done some nice work with SIMD or Go assembly. I wish Google would do more to optimize Go applications with their compiler. They prioritize build times over the users' time.

  11. #11
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    506
    Thanks
    187
    Thanked 177 Times in 120 Posts
    I have rANS implementations in both C and JavaScript (node.js). Speed wise I think it was ~2-3 fold hit if I recall. Not totally mad. That's not got any LZ searching or BWT stuff going on though, being pure entropy encoder.

  12. #12
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    353
    Thanks
    131
    Thanked 54 Times in 38 Posts
    Quote Originally Posted by JamesB View Post
    I have rANS implementations in both C and JavaScript (node.js). Speed wise I think it was ~2-3 fold hit if I recall. Not totally mad. That's not got any LZ searching or BWT stuff going on though, being pure entropy encoder.
    Interesting. Did you do anything to speed up the JS, like use asm.js?

  13. #13
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    506
    Thanks
    187
    Thanked 177 Times in 120 Posts
    Nope. I'm a total novice at JavaScript and it was my first real JS program, so likely has all sorts of inefficiencies.

    Infact the only reason I wrote the code in JavaScript was I was trying to do a fresh reimplementation based on the contents of pseudocode in a standards specification, and to act as an easily readable reference implementation. JavaScript was the easiest route because it lacks all that syntactic sugar which I normally demand from my programming languages - specifying types for example, or predeclaring variables.

    See https://github.com/jkbonfield/htscodecs/tree/master/javascript

    Note this use of rANS is a bit different to most other peoples. I'm using it with a static frequency table, which is where most people use tANS instead as the table-driven nature naturally fits static frequencies. I *should* have done the same, but for historic reasons I initial explorations into ANS were rANS as I could get my head around it more and I stuck with it. The lower levels functions in there though could be used for more dynamic updates.

  14. #14
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    506
    Thanks
    187
    Thanked 177 Times in 120 Posts
    Actually I decided to benchmark it again. The more complex formats such as adaptive order-1 range coding are quite competitive:

    Code:
    $ time node ../javascript/main_arith_gen.js -o 1 /dev/shm/enwik8 > /dev/shm/_.enc; time node ../javascript/main_arith_gen.js -d /dev/shm/_.enc > /dev/shm/_.out; ls -l /dev/shm
    Compress order 1, 100000000 => 47945026
    
    real    0m5.820s
    user    0m5.734s
    sys     0m0.096s
    Decompress 47945410 => 100000000
    
    real    0m6.117s
    user    0m6.044s
    sys     0m0.084s
    total 242144
    -rw-r--r-- 1 jkb team117  47945410 Jun 17 21:06 _.enc
    -rw-r--r-- 1 jkb team117 100000000 Jun 17 21:06 _.out
    -rw-r----- 1 jkb team117 100000000 Jun 17 21:06 enwik8
    vs C

    Code:
    time ./tests/arith_dynamic -o1 /dev/shm/enwik8 > /dev/shm/_.enc; time ./tests/arith_dynamic -d /dev/shm/_.enc > /dev/shm/_.out; ls -l /dev/shm
    Took 2617014 microseconds,  38.2 MB/s
    
    real    0m2.644s
    user    0m2.603s
    sys     0m0.039s
    Took 3203435 microseconds,  31.2 MB/s
    
    real    0m3.219s
    user    0m3.191s
    sys     0m0.029s
    total 242144
    -rw-r--r-- 1 jkb team117  47945550 Jun 17 21:07 _.enc
    -rw-r--r-- 1 jkb team117 100000000 Jun 17 21:07 _.out
    -rw-r----- 1 jkb team117 100000000 Jun 17 21:06 enwik8
    So only around 2x slower.

    The faster order-0 rans with static frequencies and heavily optimised C (including some inline asm to force it to use cmov), loop unrolling, etc, shows a bigger gap:

    Code:
    $ time node ../javascript/main_rans4x16.js -o 0 /dev/shm/enwik8 > /dev/shm/_.enc; time node ../javascript/main_rans4x16.js -d /dev/shm/_.enc > /dev/shm/_.out; ls -l /dev/shm
    Compress order 0, 100000000 => 63632618
    
    real    0m1.777s
    user    0m1.709s
    sys     0m0.084s
    Decompress 63632618 => 100000000
    
    real    0m1.018s
    user    0m0.946s
    sys     0m0.080s
    total 257464
    -rw-r--r-- 1 jkb team117  63632618 Jun 17 21:11 _.enc
    -rw-r--r-- 1 jkb team117 100000000 Jun 17 21:11 _.out
    -rw-r----- 1 jkb team117 100000000 Jun 17 21:06 enwik8
    vs C

    Code:
    $ time ./tests/rans4x16pr -o0 /dev/shm/enwik8 > /dev/shm/_.enc; time ./tests/rans4x16pr -d /dev/shm/_.enc > /dev/shm/_.out; ls -l /dev/shm
    Took 384988 microseconds, 259.7 MB/s
    
    real    0m0.405s
    user    0m0.394s
    sys     0m0.014s
    Took 184316 microseconds, 542.5 MB/s
    
    real    0m0.200s
    user    0m0.164s
    sys     0m0.036s
    total 257464
    -rw-r--r-- 1 jkb team117  63634172 Jun 17 21:10 _.enc
    -rw-r--r-- 1 jkb team117 100000000 Jun 17 21:10 _.out
    -rw-r----- 1 jkb team117 100000000 Jun 17 21:06 enwik8
    So that's 4-5x speed difference. It widens even more for order-1 rans, but as I say the javascript code has no loop unrolling and minimal nods to efficiency. It's strictly designed to be as close to the specification as possible and hence it's an easy to follow algorithmic piece. Unlike the C code which is a total mess!


    Eg the order-1 rans javascript decode inner loop is:

    Code:
        // Main decode loop
        var output = new Buffer.allocUnsafe(nbytes);
        for (var i = 0; i < nbytes; i++) {
            var i4 = i%4;
            var f = RansGetCumulativeFreq(R[i4], 12);
            var s = C2S[f]; // Equiv to RansGetSymbolFromFreq(C, f);
    
            output[i] = s;
            R[i4] = RansAdvanceStep(R[i4], C[s], F[s], 12);
            R[i4] = RansRenorm(src, R[i4]);
        }
    The C version of the same function is:

    Code:
        for (i = 0; cp < cp_end-8 && i < (out_sz&~7); i+=8) {
            for (j = 0; j < 8; j+=4) {
                RansState m0 = RansDecGet(&R[0], TF_SHIFT);
                RansState m1 = RansDecGet(&R[1], TF_SHIFT);
                R[0] = sfreq[m0] * (R[0] >> TF_SHIFT) + sbase[m0];
                R[1] = sfreq[m1] * (R[1] >> TF_SHIFT) + sbase[m1];
    
                RansDecRenorm(&R[0], &cp);
                RansDecRenorm(&R[1], &cp);
    
                out[i+j+0] = ssym[m0];
                out[i+j+1] = ssym[m1];
    
                RansState m3 = RansDecGet(&R[2], TF_SHIFT);
                RansState m4 = RansDecGet(&R[3], TF_SHIFT);
    
                R[2] = sfreq[m3] * (R[2] >> TF_SHIFT) + sbase[m3];
                R[3] = sfreq[m4] * (R[3] >> TF_SHIFT) + sbase[m4];
    
                out[i+j+2] = ssym[m3];
                out[i+j+3] = ssym[m4];
    
                RansDecRenorm(&R[2], &cp);
                RansDecRenorm(&R[3], &cp);
            }
        }
    
        // remainder
        for (; i < out_sz; i++) {
            RansState m = RansDecGet(&R[i%4], TF_SHIFT);
            R[i%4] = sfreq[m] * (R[i%4] >> TF_SHIFT) + sbase[m];
            out[i] = ssym[m];
            RansDecRenormSafe(&R[i%4], &cp, cp_end+8);
        }


    Credit to the V8 JIT engine that it gets remotely close.

Similar Threads

  1. Centmin Mod compression benchmarks
    By SolidComp in forum Data Compression
    Replies: 8
    Last Post: 3rd May 2020, 05:45
  2. Haskell Language got a new house
    By maadjordan in forum Data Compression
    Replies: 0
    Last Post: 16th April 2020, 19:48
  3. Multi-language text compression corpus?
    By Paul W. in forum Data Compression
    Replies: 13
    Last Post: 19th November 2015, 18:06
  4. Practical compression benchmarks
    By Matt Mahoney in forum Data Compression
    Replies: 36
    Last Post: 14th July 2013, 22:47
  5. Lisaac programming language
    By lunaris in forum The Off-Topic Lounge
    Replies: 0
    Last Post: 11th December 2008, 17:51

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •