Page 27 of 29 FirstFirst ... 172526272829 LastLast
Results 781 to 810 of 849

Thread: Tree alpha v0.1 download

  1. #781
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 660 Times in 354 Posts
    As I already said, caches (by their name) are transparent to assembler (and therefore any higher-language) programs - they just transparently manages some part of memory contents. "volatile" specifier in C/C++ just tells the compiler to never "cache" this variable data in registers - any read or write should go right to memory.

    For example, sequence "X=I; X++" for usual variable may generate asm commands

    load I to register
    increment register by 1
    store register to X

    while for volatile X, each read/write of X should be executed explicitly, therefore code should be

    load I to register
    store register to X
    load X to register
    increment register by 1
    store register to X

    Also, writes to volatile vars aren't reordered by compiler, given guarantees that in "queue[128] = c; ptr = 128", ptr will not be updated before c is saved.

    Of course, it was useless for single-threaded programs, except for device I/O. Later, caches was added, but they are transparent to single-threaded programs except for improving performance. CPU has privileged management commands that allow OS to disable caching for parts of memory space, which is used for device I/O. Later, SSE added non-caching store command that may be used instead.


    The next part of story is multi-threading. The simplest approach to it is reusing volatile vars, that was made to work on x86 by implementing "full cache coherency", i.e. ensuring that all cores will see memory writes in exactly the same order. This requires deep cooperation between caches (individual to each core) and therefore quite expensive. The alternative approach, implemented in some non-x86 systems is to let the programmer explicitly state when it needs to synchronize cache state between cores - memory fences.

    So, for portability to non-x86 of your program you may use volatile vars for queue+ptr, and issue memory fences in-between:
    writer: queue[128] = c; WRITE_FENCE(); ptr = 128
    reader: if (ptr == 128 ) {READ_FENCE(); с = queue[128]...}

    or you can use atomic ptr reads/writes, that includes required memory fences. I'm not sure about queue array, though. Either compiler should be smart enough to push out any vars out of registers prior to atomic_write, or you still need "volatile". It's better to ask nemequ, who is definitely expert in that area.

  2. #782
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    If I understand things correctly volatile is not necessary and an atomic store with memory order release is sufficient to ensure all writes in the current thread are visible to a thread that acquires the same atomic.

    I had forgotten about a board I designed about 10 years ago that used a ColdFire processor where the software engineers couldn't get the peripherals working correctly. When investigating the problem we eventually figured out that we had to explicitly disable the cache at the peripheral addresses to get the code to work right.

    I now have code working with acceptable performance - it's even a little faster than the released code. What I didn't realize when I first tried atomic was that just changing a variable to atomic causes the gcc compiler to use atomic loads and stores with sequentially consistent ordering even when the atomic variables were accessed with regular operations. This is why my initial attempts created executables that were a lot slower. After reading nemequ's posts many times and reading about atomics and memory order, it slowly started making sense. Now my development version of GLZAdecode does not use volatile, instead using three atomic uchars (flags) with release/acquire ordering. Hopefully nemequ will take a look when I release the code and let me know if that part looks okay.

  3. #783
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    778
    Thanks
    63
    Thanked 273 Times in 191 Posts
    Quote Originally Posted by Kennon Conrad View Post
    Sportman, this should fix the problem.
    Input:
    .glzf file

    Output:
    71,219,529 bytes

    GLZAdecode crash with 247,135,749 bytes at disk so far, last console output:

    ...
    Read 47405254 of 47405254 symbols, start 0.0000
    Common prefix scan 0 - 4892ff, score[0 - 144] = 0.00034 - 0.00000
    1375: 47401692 syms, dict. size 4756088, 16.9815 bits/sym, o0e 100618862 bytes
    Read 47401692 of 47401692 symbols, start 0.0000
    Common prefix scan 0 - 489377, score[0 - 118] = 0.00041 - 0.00000
    1376: 47398508 syms, dict. size 4756188, 16.9827 bits/sym, o0e 100619128 bytes
    Read 47398508 of 47398508 symbols, start 0.0000
    Common prefix scan 0 - 4893db, score[0 - 113] = 0.00019 - 0.00000
    1377: 47394331 syms, dict. size 4756294, 16.9842 bits/sym, o0e 100619413 bytes
    Read 47394331 of 47394331 symbols, start 0.0000
    Common prefix scan 0 - 489445, score[0 - 158] = 0.00024 - 0.00000
    1378: 47391131 syms, dict. size 4756442, 16.9854 bits/sym, o0e 100619821 bytes
    Read 47391131 of 47391131 symbols, start 0.0000
    Common prefix scan 0 - 4894d9, score[0 - 148] = 0.00018 - 0.00000
    1379: 47389018 syms, dict. size 4756585, 16.9863 bits/sym, o0e 100620219 bytes
    Read 47389018 of 47389018 symbols, start 0.0000
    Common prefix scan 0 - 489568, score[0 - 63] = 0.00010 - 0.00000
    1380: 47387516 syms, dict. size 4756648, 16.9868 bits/sym, o0e 100620394 bytes
    Read 47387516 of 47387516 symbols, start 0.0000
    Common prefix scan 0 - 4895a7, score[0 - 119] = 0.00016 - 0.00000
    1381: 47386843 syms, dict. size 4756765, 16.9871 bits/sym, o0e 100620727 bytes
    Read 47386843 of 47386843 symbols, start 0.0000
    Common prefix scan 0 - 48961c, score[0 - 24] = 0.00199 - 0.00000
    1382: 47385806 syms, dict. size 4756789, 16.9875 bits/sym, o0e 100620795 bytes
    Read 47385806 of 47385806 symbols, start 0.0000
    Common prefix scan 0 - 489634, score[0 - 39] = 0.00017 - 0.00000
    1383: 47385590 syms, dict. size 4756828, 16.9876 bits/sym, o0e 100620902 bytes
    Read 47385590 of 47385590 symbols, start 0.0000
    Common prefix scan 0 - 48965b, score[0 - 3] = 0.00002 - 0.00000
    1384: 47385576 syms, dict. size 4756831, 16.9876 bits/sym, o0e 100620910 bytes
    Read 47385576 of 47385576 symbols, start 0.0000
    4756831 grammar productions created in 417590.688 seconds.

    GLZAencode html.txt.glzc html.txt.glze
    cap encoded 1, UTF8 compliant 0
    Read 47385576 symbols including 4756831 definition symbols
    Parsed 23155084 level 0 symbols
    use_mtf 1, mcl 23 mrcl 21
    Encoded 23155084 level 1 symbols
    Reduced 28589 grammar rules
    Compressed file size: 71219529 bytes. 4728243 grammar rules. Grammar size: 473
    28398 symbols
    Grammar encoding time = 17.609 seconds.

    GLZAdecode html.txt.glze html.txt.glzd
    234358102 (then crash)

  4. #784
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    Quote Originally Posted by Sportman View Post
    GLZAdecode html.txt.glze html.txt.glzd
    234358102 (then crash)
    I think the file is probably still causing an overflow in the dictionary. This is completely solved in my working version. Can you try this version of GLZAdecode to confirm that is the problem?
    Attached Files Attached Files

  5. #785
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    778
    Thanks
    63
    Thanked 273 Times in 191 Posts
    Quote Originally Posted by Kennon Conrad View Post
    I think the file is probably still causing an overflow in the dictionary. This is completely solved in my working version. Can you try this version of GLZAdecode to confirm that is the problem?
    Congratulations you solved it! compare ok:

    GLZAdecode html.txt.glze html.txt.glzd
    Decompressed 1431632189 bytes in 10.937 seconds

    Format timing:

    GLZAformat html.txt html.txt.glzf
    Reading 1431632189 byte file
    Converting textual data
    Wrote 1 byte header and 1488089163 data bytes in 7.781 seconds.

  6. The Following 2 Users Say Thank You to Sportman For This Useful Post:

    Kennon Conrad (23rd July 2016),Paul W. (23rd July 2016)

  7. #786
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    Quote Originally Posted by Sportman View Post
    Congratulations you solved it! compare ok:

    GLZAdecode html.txt.glze html.txt.glzd
    Decompressed 1431632189 bytes in 10.937 seconds

    Format timing:

    GLZAformat html.txt html.txt.glzf
    Reading 1431632189 byte file
    Converting textual data
    Wrote 1 byte header and 1488089163 data bytes in 7.781 seconds.
    Thank you for the help, Sportman! It's not so easy with files that can't be shared. Here are the results with corrected GLZA:

    Input:
    1,431,632,189 bytes - HTML text

    Output:
    177,899,364 bytes - zstd 22
    172,474,317 bytes - bro 11
    134,856,213 bytes - freearc ultra
    121,396,303 bytes - lzturbo 32
    100,389,150 bytes - bsc 2
    89,429,996 bytes - rar max
    86,881,088 bytes - bce
    79,840,886 bytes - 7z ultra
    71,219,529 bytes - glza

    So compression ratio and decompression time are good but the compression time of almost 5 days is pretty ridiculous. In the long run this can be dramatically improved but will take time to do it properly. For now I think the important part is that using grammar rules instead of byte offsets (LZ77) provides a noticeably better compression ratio on the test file.

  8. #787
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts

    GLZA v0.6

    GLZA v0.6 has the following changes compared to v0.5:

    Bug fixes for >1GB files and large dictionaries (Sportman) and GLZAencode crash (rare)
    Fixed capital lock and mtf code errors that caused a slight penalty in compression ratio.
    Extended delta filter to check for strides of up to 100.
    Tweaked a model.

    From nemequ's list:
    Uses atomic variable instead of volatile variables.
    More functions, fewer macros
    Uses C99 types
    Programs print a (small) help screen when no command line arguments are used
    Fixed exit return values

    GLZAcompress is a little slower. GLZAdecode is a little faster. The change is less than 3% in either case.

    Compression ratios are generally similar, on average slightly better for typical benchmark files. The most dramatic change in my test set is with kennedy.xls from Canterbury Corpus:

    on AMD A8-5500 3.2 GHz:
    v0.5: 1,029,744 -> 146,270 bytes in 4.3 seconds, 0.046 seconds to decode.
    v0.6: 1,029,744 -> 23,099 bytes in 7.5 seconds, 0.015 seconds to decode.


    I assume kennedy.xls is the same file that Charles Bloom calls lzt25. This is what he shows on cb's rants (http://cbloom.com/rants.html):

    Found another weird one where RAR filters do magic; lzt25 is super-structured 13-byte structs :
    lzt25.rar,40024 // <- WOW RAR filters!
    lzt25.nz,45397
    lzt25.7z,51942
    lzt25_lp2.7z,52579
    lzt25.LZNA,58903
    lzt25.zl8.LZNA,61582 // <- zl8 LZNA worse than zl6 - weird file
    lzt25.lzx21,63198
    lzt25.zstd060,64550 // <- ZStd does surprisingly well here, I thought you needed more reps on this file
    lzt25.brotli9,67856
    lzt25.Kraken,67986
    lzt25.brotli10,68472 // <- brotli10 worse than brotli9 !
    lzt25.BitKnit,92940 // <- BitKnit oddly struggling
    lzt25.mc-.rar,106423 // <- unfiltered RAR is the worst of the LZ's
    lzt25.z9.zip,209811
    lzt25.lz4xc4,324125
    lzt25.raw,1029744
    Attached Files Attached Files

  9. The Following 4 Users Say Thank You to Kennon Conrad For This Useful Post:

    Paul W. (2nd August 2016),Sportman (2nd August 2016),surfersat (28th September 2016),xezz (4th August 2016)

  10. #788
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    778
    Thanks
    63
    Thanked 273 Times in 191 Posts
    Quote Originally Posted by Kennon Conrad View Post
    GLZA v0.6
    Input:
    67,111,000 bytes - chrome_child.dll

    Output:
    25,942,102 bytes - bce
    23,401,099 bytes - zstd 22
    23,396,089 bytes - glza
    22,306,031 bytes - bro 11
    22,124,059 bytes - rar max
    19,641,152 bytes - freearc ultra
    18,716,820 bytes - 7z ultra


    Input:
    52,638,664 bytes - xul.dll

    Output:
    20,519,632 bytes - bce
    19,371,390 bytes - zstd 22
    19,007,845 bytes - glza
    18,418,048 bytes - bro 11
    18,311,943 bytes - rar max
    16,294,045 bytes - freearc ultra
    15,455,141 bytes - 7z ultra


    Input:
    50,708,664 bytes - libcef.dll

    Output:
    19,479,330 bytes - bce
    18,675,207 bytes - zstd 22
    18,198,201 bytes - glza
    17,865,765 bytes - bro 11
    17,212,605 bytes - rar max
    15,413,439 bytes - freearc ultra
    14,534,231 bytes - 7z ultra


    Input:
    25,338,368 bytes - icudt54.dll

    Output:
    9,443,665 bytes - glza
    8,226,926 bytes - bce
    7,362,616 bytes - rar max
    7,329,790 bytes - zstd 22
    6,522,117 bytes - bro 11
    6,108,062 bytes - 7z ultra
    5,962,531 bytes - freearc ultra


    Input:
    17,595,072 bytes - pepflashplayer32_22_0_0_192.dll

    Output:
    8,922,602 bytes - bce
    8,079,585 bytes - glza
    7,990,377 bytes - zstd 22
    7,632,298 bytes - bro 11
    7,532,777 bytes - rar max
    6,939,240 bytes - freearc ultra
    6,898,954 bytes - 7z ultra


    Input:
    14,984,992 bytes - QtWebKit4.dll

    Output:
    4,475,340 bytes - bce
    4,085,710 bytes - glza
    4,044,446 bytes - zstd 22
    3,861,756 bytes - bro 11
    3,555,295 bytes - rar max
    3,195,500 bytes - 7z ultra
    3,233,466 bytes - freearc ultra

  11. The Following User Says Thank You to Sportman For This Useful Post:

    Kennon Conrad (3rd August 2016)

  12. #789
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    As your results show GLZA is not very effective on most .dll files compares to LZ77 based algorithms. I think it could do considerably better if it had support for production rules that can be used to implement local transformations and support for BCJ filters. These are not things I consider to be high priority but they would be nice additions in the long run.

  13. #790
    Member
    Join Date
    Nov 2015
    Location
    ?l?nsk, PL
    Posts
    81
    Thanks
    9
    Thanked 13 Times in 11 Posts
    Quote Originally Posted by Kennon Conrad View Post
    As your results show GLZA is not very effective on most .dll files compares to LZ77 based algorithms. I think it could do considerably better if it had support for production rules that can be used to implement local transformations and support for BCJ filters. These are not things I consider to be high priority but they would be nice additions in the long run.
    Zstd doesn't have an exe filter, yet it looks very close to GLZA.

    Would adding filters inside the algorithm be significantly better than pre-processing with it? I ask because it doesn't seem right to increase the complexity of the algorithm to handle such special cases.

  14. #791
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    Quote Originally Posted by m^3 View Post
    Zstd doesn't have an exe filter, yet it looks very close to GLZA.

    Would adding filters inside the algorithm be significantly better than pre-processing with it? I ask because it doesn't seem right to increase the complexity of the algorithm to handle such special cases.
    It seems that pre-processing would be better and much easier to implement. I could see where some "skip" rules might be better as part of the main algorithm but for things like BCJ and local delta filters they would be best implemented up front.

  15. #792
    Member
    Join Date
    Dec 2012
    Location
    japan
    Posts
    150
    Thanks
    30
    Thanked 59 Times in 35 Posts
    Can you write in other langeuage? I hope php or JavaScript or Python.

  16. #793
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    Quote Originally Posted by xezz View Post
    Can you write in other langeuage? I hope php or JavaScript or Python.
    Sorry, I don't know any of those programming languages. If anyone else is interested, I'd certainly be willing to answer questions about the C code.

  17. #794
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts

    GLZA v0.7

    GLZA v0.7 performance is similar to GLZA v0.6 except I fixed a problem that caused some symbols that appear 16 times in the production rules were not considered for MTF encoding. So for files that use MTF, compressed file sizes are now anywhere from 0.1% better to slightly worse.

    The main change is that GLZA is now one program, GLZA. To compress a file, use GLZA c <infile> <outfile>. To decompress a file, use GLZA d <infile> <outfile>. No user options are currently supported. I want to figure out the .dll thing before I add option passing to the subfunctions.
    Attached Files Attached Files

  18. The Following 5 Users Say Thank You to Kennon Conrad For This Useful Post:

    Bulat Ziganshin (12th August 2016),comp1 (10th August 2016),Nania Francesco (13th August 2016),Sportman (12th August 2016),surfersat (28th September 2016)

  19. #795
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    778
    Thanks
    63
    Thanked 273 Times in 191 Posts
    Input:
    75,481,052 bytes, text file with 4,343,633 unique domains from Wikipedia.

    Output:
    28,409,263 bytes - bro 11 (0.416 sec. decomp)
    28,387,045 bytes - rar max (0.526 sec. decomp)
    27,367,324 bytes - zstd 22 (0.327 sec. decomp)
    26,266,954 bytes - 7z ultra (1.213 sec. decomp)
    25,856,145 bytes - lzturbo 49 (1.715 sec. decomp)
    22,624,060 bytes - freearc ultra
    22,092,657 bytes - glza (1.688 sec. decomp)
    21,872,106 bytes - bce
    21,282,302 bytes - emma text max all dicts
    20,809,789 bytes - paq8pxd18 15
    20,486,693 bytes - cmix dict

  20. The Following User Says Thank You to Sportman For This Useful Post:

    Kennon Conrad (12th August 2016)

  21. #796
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    778
    Thanks
    63
    Thanked 273 Times in 191 Posts
    Same input file with lzbench:

    Code:
    Compressor name         Compress. Decompress. Compr. size  Ratio
    xz 5.2.2 -9              1.07 MB/s    63 MB/s    26264780  34.80
    zstd 0.7.1 -22           1.12 MB/s   256 MB/s    26382257  34.95
    csc 3.3 -5               2.21 MB/s    48 MB/s    26382631  34.95
    lzlib 1.7 -9             1.08 MB/s    47 MB/s    26438902  35.03
    lzma 9.38 -5             1.24 MB/s    70 MB/s    26829334  35.54
    tornado 0.6a -16         1.46 MB/s   149 MB/s    26847834  35.57
    csc 3.3 -3               4.79 MB/s    43 MB/s    27138268  35.95
    lzlib 1.7 -6             1.44 MB/s    47 MB/s    27272926  36.13
    xz 5.2.2 -6              1.50 MB/s    63 MB/s    27277528  36.14
    brotli 0.4.0 -11         0.57 MB/s   280 MB/s    27574511  36.53
    zstd 0.7.1 -18           2.27 MB/s   320 MB/s    27661719  36.65
    tornado 0.6a -13         5.12 MB/s   139 MB/s    28645105  37.95
    csc 3.3 -1                 10 MB/s    44 MB/s    29091723  38.54
    tornado 0.6a -10         3.55 MB/s   131 MB/s    29239265  38.74
    zstd 0.7.1 -15           6.53 MB/s   304 MB/s    29816308  39.50
    tornado 0.6a -7            11 MB/s   137 MB/s    30014469  39.76
    lzham 1.0 -d26 -1        2.03 MB/s   145 MB/s    30225678  40.04
    lzlib 1.7 -3             4.02 MB/s    42 MB/s    30380336  40.25
    brotli 0.4.0 -8          5.63 MB/s   338 MB/s    30524458  40.44
    zling 2016-01-10 -4        25 MB/s   118 MB/s    30530116  40.45
    zstd 0.7.1 -11             14 MB/s   306 MB/s    30615643  40.56
    zling 2016-01-10 -3        28 MB/s   118 MB/s    30798048  40.80
    zling 2016-01-10 -2        32 MB/s   116 MB/s    30965319  41.02
    zling 2016-01-10 -1        36 MB/s   111 MB/s    31182632  41.31
    xz 5.2.2 -3              3.42 MB/s    54 MB/s    31294612  41.46
    zstd 0.7.1 -8              29 MB/s   315 MB/s    31359720  41.55
    zling 2016-01-10 -0        42 MB/s   114 MB/s    31528082  41.77
    lzma 9.38 -4             7.32 MB/s    56 MB/s    32153474  42.60
    tornado 0.6a -6            32 MB/s   128 MB/s    32372581  42.89
    brotli 0.4.0 -5            17 MB/s   310 MB/s    32698630  43.32
    crush 1.0 -2             0.54 MB/s   207 MB/s    32730360  43.36
    xpack 2016-06-02 -9      7.98 MB/s   573 MB/s    32947155  43.65
    lzma 9.38 -2               11 MB/s    51 MB/s    33081532  43.83
    tornado 0.6a -5            42 MB/s   126 MB/s    33161180  43.93
    xpack 2016-06-02 -6        19 MB/s   569 MB/s    33355010  44.19
    zstd 0.7.1 -5              63 MB/s   307 MB/s    33442944  44.31
    crush 1.0 -1             3.38 MB/s   205 MB/s    33667337  44.60
    zlib 1.2.8 -9            8.68 MB/s   238 MB/s    34020428  45.07
    zstd 0.7.1 -2             134 MB/s   521 MB/s    34224538  45.34
    zlib 1.2.8 -6              19 MB/s   237 MB/s    34236668  45.36
    brotli 0.4.0 -2            74 MB/s   286 MB/s    34323549  45.47
    lzfse 2016-06-19           42 MB/s   478 MB/s    34442323  45.63
    lzlib 1.7 -0               21 MB/s    34 MB/s    34802127  46.11
    lzham 1.0 -d26 -0        7.08 MB/s   129 MB/s    35157859  46.58
    lzma 9.38 -0               16 MB/s    42 MB/s    35252635  46.70
    lz5hc 1.4.1 -15          2.13 MB/s   291 MB/s    35269096  46.73
    tornado 0.6a -4            84 MB/s   171 MB/s    35299464  46.77
    zstd 0.7.1 -1             197 MB/s   668 MB/s    35856840  47.50
    xz 5.2.2 -0                13 MB/s    39 MB/s    35998760  47.69
    tornado 0.6a -3           101 MB/s   162 MB/s    36441868  48.28
    lzsse2 2016-05-14 -17    7.22 MB/s  2387 MB/s    36669486  48.58
    lzsse2 2016-05-14 -12    7.16 MB/s  2386 MB/s    36669486  48.58
    lzsse2 2016-05-14 -6     7.21 MB/s  2388 MB/s    36669486  48.58
    xpack 2016-06-02 -1        92 MB/s   504 MB/s    37202360  49.29
    crush 1.0 -0               36 MB/s   194 MB/s    37228896  49.32
    lzsse4 2016-05-14 -6     9.46 MB/s  2530 MB/s    37511708  49.70
    lzsse4 2016-05-14 -12    9.40 MB/s  2530 MB/s    37511708  49.70
    lzsse4 2016-05-14 -17    9.38 MB/s  2528 MB/s    37511708  49.70
    lzsse8 2016-05-14 -6     8.86 MB/s  2571 MB/s    37513193  49.70
    lzsse8 2016-05-14 -12    8.96 MB/s  2572 MB/s    37513193  49.70
    lzsse8 2016-05-14 -17    8.88 MB/s  2571 MB/s    37513193  49.70
    lz5hc 1.4.1 -12          6.03 MB/s   360 MB/s    37802087  50.08
    brotli 0.4.0 -0           174 MB/s   245 MB/s    38005179  50.35
    zlib 1.2.8 -1              68 MB/s   260 MB/s    38263933  50.69
    ucl_nrv2e 1.03 -9        0.73 MB/s   225 MB/s    38564531  51.09
    ucl_nrv2d 1.03 -9        0.73 MB/s   225 MB/s    38779760  51.38
    ucl_nrv2e 1.03 -6        9.76 MB/s   208 MB/s    39575337  52.43
    ucl_nrv2b 1.03 -9        0.71 MB/s   219 MB/s    39628722  52.50
    ucl_nrv2b 1.03 -6        9.54 MB/s   210 MB/s    39785420  52.71
    ucl_nrv2d 1.03 -6        9.78 MB/s   209 MB/s    39813335  52.75
    density 0.12.5 beta -3    245 MB/s   215 MB/s    39942582  52.92
    lzo1x 2.09 -999          4.68 MB/s   349 MB/s    40012645  53.01
    lz5hc 1.4.1 -9             19 MB/s   422 MB/s    40200534  53.26
    lzo1z 2.09 -999          4.62 MB/s   337 MB/s    40241379  53.31
    lzo1b 2.09 -999          5.44 MB/s   454 MB/s    40242414  53.31
    lzsse8 2016-05-14 -1       13 MB/s  2426 MB/s    40469085  53.61
    lzsse4 2016-05-14 -1       14 MB/s  2371 MB/s    40548546  53.72
    lzo2a 2.09 -999            21 MB/s   322 MB/s    40860926  54.13
    lzo1y 2.09 -999          4.57 MB/s   353 MB/s    40884666  54.17
    lz4hc r131 -12             19 MB/s  2414 MB/s    41061711  54.40
    lz4hc r131 -16             19 MB/s  2329 MB/s    41061711  54.40
    lz4hc r131 -9              24 MB/s  2413 MB/s    41100051  54.45
    lzo1c 2.09 -999            17 MB/s   449 MB/s    41641030  55.17
    lzmat 1.01                 15 MB/s   241 MB/s    41894883  55.50
    lz4hc r131 -4              46 MB/s  2264 MB/s    42271114  56.00
    ucl_nrv2e 1.03 -1          31 MB/s   203 MB/s    42383482  56.15
    ucl_nrv2b 1.03 -1          31 MB/s   205 MB/s    42417041  56.20
    ucl_nrv2d 1.03 -1          32 MB/s   202 MB/s    42575130  56.41
    quicklz 1.5.0 -2          140 MB/s   395 MB/s    42798467  56.70
    lzo1f 2.09 -999            14 MB/s   347 MB/s    42880732  56.81
    brieflz 1.1.0              75 MB/s   131 MB/s    43301822  57.37
    density 0.12.5 beta -2    390 MB/s   377 MB/s    43685370  57.88
    gipfeli 2015-11-30        168 MB/s   289 MB/s    43756909  57.97
    yalz77 2015-09-19 -12      23 MB/s   232 MB/s    43978841  58.26
    quicklz 1.5.0 -3           36 MB/s   634 MB/s    44055449  58.37
    lzsse2 2016-05-14 -1       16 MB/s  2018 MB/s    44453573  58.89
    lzrw 15-Jul-1991 -5        99 MB/s   356 MB/s    44681359  59.20
    lzvn 2016-06-19            42 MB/s   676 MB/s    44963128  59.57
    yalz77 2015-09-19 -8       30 MB/s   230 MB/s    45167741  59.84
    lzo1b 2.09 -99             67 MB/s   425 MB/s    45241364  59.94
    lzo1c 2.09 -99             68 MB/s   455 MB/s    45441579  60.20
    lzo1a 2.09 -99             80 MB/s   439 MB/s    45732697  60.59
    lzg 1.0.8 -8             3.65 MB/s   369 MB/s    46047120  61.00
    lzo1 2.09 -99              80 MB/s   384 MB/s    46205348  61.21
    lzo1c 2.09 -9              99 MB/s   460 MB/s    46423044  61.50
    lzo1b 2.09 -9              98 MB/s   441 MB/s    46566869  61.69
    lz5 1.4.1                 126 MB/s   287 MB/s    46630732  61.78
    lzo1b 2.09 -6             136 MB/s   467 MB/s    46751918  61.94
    lzo1c 2.09 -6             125 MB/s   476 MB/s    46923883  62.17
    yalz77 2015-09-19 -4       41 MB/s   225 MB/s    47219256  62.56
    tornado 0.6a -2           143 MB/s   236 MB/s    47234262  62.58
    lz5hc 1.4.1 -4            117 MB/s   766 MB/s    47278114  62.64
    lz4hc r131 -1              87 MB/s  2284 MB/s    47322165  62.69
    lzrw 15-Jul-1991 -4       202 MB/s   361 MB/s    48003209  63.60
    density 0.12.5 beta -1    624 MB/s   789 MB/s    48244212  63.92
    quicklz 1.5.0 -1          283 MB/s   399 MB/s    48265909  63.94
    shrinker 0.1              234 MB/s   809 MB/s    48392802  64.11
    lzf 3.6 -1                220 MB/s   484 MB/s    48794557  64.64
    pithy 2011-12-24 -9       146 MB/s   800 MB/s    49682147  65.82
    lzo1b 2.09 -3             129 MB/s   417 MB/s    49704607  65.85
    lzg 1.0.8 -6               11 MB/s   362 MB/s    49954909  66.18
    lzrw 15-Jul-1991 -3       183 MB/s   386 MB/s    50240067  66.56
    lzo1c 2.09 -3             129 MB/s   447 MB/s    50360806  66.72
    fastlz 0.1 -2             213 MB/s   375 MB/s    50455768  66.85
    fastlz 0.1 -1             221 MB/s   379 MB/s    50524699  66.94
    blosclz 2015-11-10 -9     176 MB/s   562 MB/s    50953927  67.51
    pithy 2011-12-24 -6       177 MB/s   834 MB/s    51036884  67.62
    lzo1a 2.09 -1             182 MB/s   419 MB/s    51042307  67.62
    yalz77 2015-09-19 -1       60 MB/s   229 MB/s    51386478  68.08
    lzo1b 2.09 -1             132 MB/s   406 MB/s    51581314  68.34
    yappy 2014-03-22 -100      63 MB/s  2076 MB/s    51800889  68.63
    lzo1 2.09 -1              190 MB/s   362 MB/s    51865841  68.71
    lzo1c 2.09 -1             134 MB/s   452 MB/s    51954061  68.83
    yappy 2014-03-22 -10       78 MB/s  2059 MB/s    52313910  69.31
    lzo1x 2.09 -1             314 MB/s   420 MB/s    52537407  69.60
    lzo1f 2.09 -1             126 MB/s   393 MB/s    52735666  69.87
    lzrw 15-Jul-1991 -2       179 MB/s   384 MB/s    52777872  69.92
    lzrw 15-Jul-1991 -1       174 MB/s   380 MB/s    52779016  69.92
    yappy 2014-03-22 -1        92 MB/s  1959 MB/s    52980876  70.19
    lzg 1.0.8 -4               31 MB/s   381 MB/s    52999546  70.22
    lzo1y 2.09 -1             315 MB/s   430 MB/s    53441792  70.80
    lzo1x 2.09 -15            342 MB/s   429 MB/s    53545578  70.94
    lzf 3.6 -0                196 MB/s   445 MB/s    53607090  71.02
    pithy 2011-12-24 -3       277 MB/s  1036 MB/s    54288680  71.92
    snappy 1.1.3              261 MB/s   936 MB/s    54538734  72.25
    lzo1x 2.09 -12            371 MB/s   453 MB/s    55077565  72.97
    lzg 1.0.8 -1               59 MB/s   426 MB/s    55593761  73.65
    lz4 r131                  442 MB/s  2284 MB/s    56143775  74.38
    pithy 2011-12-24 -0       336 MB/s  1208 MB/s    56786078  75.23
    lzo1x 2.09 -11            403 MB/s   511 MB/s    57152425  75.72
    wflz 2015-09-16           123 MB/s   702 MB/s    58464473  77.46
    tornado 0.6a -1           201 MB/s   308 MB/s    58877103  78.00
    lzjb 2010                 206 MB/s   413 MB/s    59912482  79.37
    lz4fast r131 -3           574 MB/s  2211 MB/s    61066462  80.90
    lz5hc 1.4.1 -1            449 MB/s  1597 MB/s    63586753  84.24
    lz4fast r131 -17         1506 MB/s  4960 MB/s    73101711  96.85
    blosclz 2015-11-10 -6     196 MB/s  6493 MB/s    75481052 100.00
    blosclz 2015-11-10 -3     527 MB/s  6482 MB/s    75481052 100.00
    blosclz 2015-11-10 -1     985 MB/s  6494 MB/s    75481052 100.00

  22. The Following User Says Thank You to Sportman For This Useful Post:

    Kennon Conrad (12th August 2016)

  23. #797
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    Do you know why the brotli -11 result is different?

    I suppose glza is obtaining better compression ratios than other LZ compressors on this file is because it is better at handling frequently recurring strings with unstable match distances.

  24. #798
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    233
    Thanks
    92
    Thanked 47 Times in 31 Posts
    Quote Originally Posted by Sportman View Post
    Same input file with lzbench:

    Code:
    Compressor name         Compress. Decompress. Compr. size  Ratio
    xz 5.2.2 -9              1.07 MB/s    63 MB/s    26264780  34.80
    zstd 0.7.1 -22           1.12 MB/s   256 MB/s    26382257  34.95
    csc 3.3 -5               2.21 MB/s    48 MB/s    26382631  34.95
    lzlib 1.7 -9             1.08 MB/s    47 MB/s    26438902  35.03
    lzma 9.38 -5             1.24 MB/s    70 MB/s    26829334  35.54
    tornado 0.6a -16         1.46 MB/s   149 MB/s    26847834  35.57
    csc 3.3 -3               4.79 MB/s    43 MB/s    27138268  35.95
    lzlib 1.7 -6             1.44 MB/s    47 MB/s    27272926  36.13
    xz 5.2.2 -6              1.50 MB/s    63 MB/s    27277528  36.14
    brotli 0.4.0 -11         0.57 MB/s   280 MB/s    27574511  36.53
    zstd 0.7.1 -18           2.27 MB/s   320 MB/s    27661719  36.65
    tornado 0.6a -13         5.12 MB/s   139 MB/s    28645105  37.95
    csc 3.3 -1                 10 MB/s    44 MB/s    29091723  38.54
    tornado 0.6a -10         3.55 MB/s   131 MB/s    29239265  38.74
    zstd 0.7.1 -15           6.53 MB/s   304 MB/s    29816308  39.50
    tornado 0.6a -7            11 MB/s   137 MB/s    30014469  39.76
    lzham 1.0 -d26 -1        2.03 MB/s   145 MB/s    30225678  40.04
    lzlib 1.7 -3             4.02 MB/s    42 MB/s    30380336  40.25
    brotli 0.4.0 -8          5.63 MB/s   338 MB/s    30524458  40.44
    zling 2016-01-10 -4        25 MB/s   118 MB/s    30530116  40.45
    zstd 0.7.1 -11             14 MB/s   306 MB/s    30615643  40.56
    zling 2016-01-10 -3        28 MB/s   118 MB/s    30798048  40.80
    zling 2016-01-10 -2        32 MB/s   116 MB/s    30965319  41.02
    zling 2016-01-10 -1        36 MB/s   111 MB/s    31182632  41.31
    xz 5.2.2 -3              3.42 MB/s    54 MB/s    31294612  41.46
    zstd 0.7.1 -8              29 MB/s   315 MB/s    31359720  41.55
    zling 2016-01-10 -0        42 MB/s   114 MB/s    31528082  41.77
    lzma 9.38 -4             7.32 MB/s    56 MB/s    32153474  42.60
    tornado 0.6a -6            32 MB/s   128 MB/s    32372581  42.89
    brotli 0.4.0 -5            17 MB/s   310 MB/s    32698630  43.32
    crush 1.0 -2             0.54 MB/s   207 MB/s    32730360  43.36
    xpack 2016-06-02 -9      7.98 MB/s   573 MB/s    32947155  43.65
    lzma 9.38 -2               11 MB/s    51 MB/s    33081532  43.83
    tornado 0.6a -5            42 MB/s   126 MB/s    33161180  43.93
    xpack 2016-06-02 -6        19 MB/s   569 MB/s    33355010  44.19
    zstd 0.7.1 -5              63 MB/s   307 MB/s    33442944  44.31
    crush 1.0 -1             3.38 MB/s   205 MB/s    33667337  44.60
    zlib 1.2.8 -9            8.68 MB/s   238 MB/s    34020428  45.07
    zstd 0.7.1 -2             134 MB/s   521 MB/s    34224538  45.34
    zlib 1.2.8 -6              19 MB/s   237 MB/s    34236668  45.36
    brotli 0.4.0 -2            74 MB/s   286 MB/s    34323549  45.47
    lzfse 2016-06-19           42 MB/s   478 MB/s    34442323  45.63
    lzlib 1.7 -0               21 MB/s    34 MB/s    34802127  46.11
    lzham 1.0 -d26 -0        7.08 MB/s   129 MB/s    35157859  46.58
    lzma 9.38 -0               16 MB/s    42 MB/s    35252635  46.70
    lz5hc 1.4.1 -15          2.13 MB/s   291 MB/s    35269096  46.73
    tornado 0.6a -4            84 MB/s   171 MB/s    35299464  46.77
    zstd 0.7.1 -1             197 MB/s   668 MB/s    35856840  47.50
    xz 5.2.2 -0                13 MB/s    39 MB/s    35998760  47.69
    tornado 0.6a -3           101 MB/s   162 MB/s    36441868  48.28
    lzsse2 2016-05-14 -17    7.22 MB/s  2387 MB/s    36669486  48.58
    lzsse2 2016-05-14 -12    7.16 MB/s  2386 MB/s    36669486  48.58
    lzsse2 2016-05-14 -6     7.21 MB/s  2388 MB/s    36669486  48.58
    xpack 2016-06-02 -1        92 MB/s   504 MB/s    37202360  49.29
    crush 1.0 -0               36 MB/s   194 MB/s    37228896  49.32
    lzsse4 2016-05-14 -6     9.46 MB/s  2530 MB/s    37511708  49.70
    lzsse4 2016-05-14 -12    9.40 MB/s  2530 MB/s    37511708  49.70
    lzsse4 2016-05-14 -17    9.38 MB/s  2528 MB/s    37511708  49.70
    lzsse8 2016-05-14 -6     8.86 MB/s  2571 MB/s    37513193  49.70
    lzsse8 2016-05-14 -12    8.96 MB/s  2572 MB/s    37513193  49.70
    lzsse8 2016-05-14 -17    8.88 MB/s  2571 MB/s    37513193  49.70
    lz5hc 1.4.1 -12          6.03 MB/s   360 MB/s    37802087  50.08
    brotli 0.4.0 -0           174 MB/s   245 MB/s    38005179  50.35
    zlib 1.2.8 -1              68 MB/s   260 MB/s    38263933  50.69
    ucl_nrv2e 1.03 -9        0.73 MB/s   225 MB/s    38564531  51.09
    ucl_nrv2d 1.03 -9        0.73 MB/s   225 MB/s    38779760  51.38
    ucl_nrv2e 1.03 -6        9.76 MB/s   208 MB/s    39575337  52.43
    ucl_nrv2b 1.03 -9        0.71 MB/s   219 MB/s    39628722  52.50
    ucl_nrv2b 1.03 -6        9.54 MB/s   210 MB/s    39785420  52.71
    ucl_nrv2d 1.03 -6        9.78 MB/s   209 MB/s    39813335  52.75
    density 0.12.5 beta -3    245 MB/s   215 MB/s    39942582  52.92
    lzo1x 2.09 -999          4.68 MB/s   349 MB/s    40012645  53.01
    lz5hc 1.4.1 -9             19 MB/s   422 MB/s    40200534  53.26
    lzo1z 2.09 -999          4.62 MB/s   337 MB/s    40241379  53.31
    lzo1b 2.09 -999          5.44 MB/s   454 MB/s    40242414  53.31
    lzsse8 2016-05-14 -1       13 MB/s  2426 MB/s    40469085  53.61
    lzsse4 2016-05-14 -1       14 MB/s  2371 MB/s    40548546  53.72
    lzo2a 2.09 -999            21 MB/s   322 MB/s    40860926  54.13
    lzo1y 2.09 -999          4.57 MB/s   353 MB/s    40884666  54.17
    lz4hc r131 -12             19 MB/s  2414 MB/s    41061711  54.40
    lz4hc r131 -16             19 MB/s  2329 MB/s    41061711  54.40
    lz4hc r131 -9              24 MB/s  2413 MB/s    41100051  54.45
    lzo1c 2.09 -999            17 MB/s   449 MB/s    41641030  55.17
    lzmat 1.01                 15 MB/s   241 MB/s    41894883  55.50
    lz4hc r131 -4              46 MB/s  2264 MB/s    42271114  56.00
    ucl_nrv2e 1.03 -1          31 MB/s   203 MB/s    42383482  56.15
    ucl_nrv2b 1.03 -1          31 MB/s   205 MB/s    42417041  56.20
    ucl_nrv2d 1.03 -1          32 MB/s   202 MB/s    42575130  56.41
    quicklz 1.5.0 -2          140 MB/s   395 MB/s    42798467  56.70
    lzo1f 2.09 -999            14 MB/s   347 MB/s    42880732  56.81
    brieflz 1.1.0              75 MB/s   131 MB/s    43301822  57.37
    density 0.12.5 beta -2    390 MB/s   377 MB/s    43685370  57.88
    gipfeli 2015-11-30        168 MB/s   289 MB/s    43756909  57.97
    yalz77 2015-09-19 -12      23 MB/s   232 MB/s    43978841  58.26
    quicklz 1.5.0 -3           36 MB/s   634 MB/s    44055449  58.37
    lzsse2 2016-05-14 -1       16 MB/s  2018 MB/s    44453573  58.89
    lzrw 15-Jul-1991 -5        99 MB/s   356 MB/s    44681359  59.20
    lzvn 2016-06-19            42 MB/s   676 MB/s    44963128  59.57
    yalz77 2015-09-19 -8       30 MB/s   230 MB/s    45167741  59.84
    lzo1b 2.09 -99             67 MB/s   425 MB/s    45241364  59.94
    lzo1c 2.09 -99             68 MB/s   455 MB/s    45441579  60.20
    lzo1a 2.09 -99             80 MB/s   439 MB/s    45732697  60.59
    lzg 1.0.8 -8             3.65 MB/s   369 MB/s    46047120  61.00
    lzo1 2.09 -99              80 MB/s   384 MB/s    46205348  61.21
    lzo1c 2.09 -9              99 MB/s   460 MB/s    46423044  61.50
    lzo1b 2.09 -9              98 MB/s   441 MB/s    46566869  61.69
    lz5 1.4.1                 126 MB/s   287 MB/s    46630732  61.78
    lzo1b 2.09 -6             136 MB/s   467 MB/s    46751918  61.94
    lzo1c 2.09 -6             125 MB/s   476 MB/s    46923883  62.17
    yalz77 2015-09-19 -4       41 MB/s   225 MB/s    47219256  62.56
    tornado 0.6a -2           143 MB/s   236 MB/s    47234262  62.58
    lz5hc 1.4.1 -4            117 MB/s   766 MB/s    47278114  62.64
    lz4hc r131 -1              87 MB/s  2284 MB/s    47322165  62.69
    lzrw 15-Jul-1991 -4       202 MB/s   361 MB/s    48003209  63.60
    density 0.12.5 beta -1    624 MB/s   789 MB/s    48244212  63.92
    quicklz 1.5.0 -1          283 MB/s   399 MB/s    48265909  63.94
    shrinker 0.1              234 MB/s   809 MB/s    48392802  64.11
    lzf 3.6 -1                220 MB/s   484 MB/s    48794557  64.64
    pithy 2011-12-24 -9       146 MB/s   800 MB/s    49682147  65.82
    lzo1b 2.09 -3             129 MB/s   417 MB/s    49704607  65.85
    lzg 1.0.8 -6               11 MB/s   362 MB/s    49954909  66.18
    lzrw 15-Jul-1991 -3       183 MB/s   386 MB/s    50240067  66.56
    lzo1c 2.09 -3             129 MB/s   447 MB/s    50360806  66.72
    fastlz 0.1 -2             213 MB/s   375 MB/s    50455768  66.85
    fastlz 0.1 -1             221 MB/s   379 MB/s    50524699  66.94
    blosclz 2015-11-10 -9     176 MB/s   562 MB/s    50953927  67.51
    pithy 2011-12-24 -6       177 MB/s   834 MB/s    51036884  67.62
    lzo1a 2.09 -1             182 MB/s   419 MB/s    51042307  67.62
    yalz77 2015-09-19 -1       60 MB/s   229 MB/s    51386478  68.08
    lzo1b 2.09 -1             132 MB/s   406 MB/s    51581314  68.34
    yappy 2014-03-22 -100      63 MB/s  2076 MB/s    51800889  68.63
    lzo1 2.09 -1              190 MB/s   362 MB/s    51865841  68.71
    lzo1c 2.09 -1             134 MB/s   452 MB/s    51954061  68.83
    yappy 2014-03-22 -10       78 MB/s  2059 MB/s    52313910  69.31
    lzo1x 2.09 -1             314 MB/s   420 MB/s    52537407  69.60
    lzo1f 2.09 -1             126 MB/s   393 MB/s    52735666  69.87
    lzrw 15-Jul-1991 -2       179 MB/s   384 MB/s    52777872  69.92
    lzrw 15-Jul-1991 -1       174 MB/s   380 MB/s    52779016  69.92
    yappy 2014-03-22 -1        92 MB/s  1959 MB/s    52980876  70.19
    lzg 1.0.8 -4               31 MB/s   381 MB/s    52999546  70.22
    lzo1y 2.09 -1             315 MB/s   430 MB/s    53441792  70.80
    lzo1x 2.09 -15            342 MB/s   429 MB/s    53545578  70.94
    lzf 3.6 -0                196 MB/s   445 MB/s    53607090  71.02
    pithy 2011-12-24 -3       277 MB/s  1036 MB/s    54288680  71.92
    snappy 1.1.3              261 MB/s   936 MB/s    54538734  72.25
    lzo1x 2.09 -12            371 MB/s   453 MB/s    55077565  72.97
    lzg 1.0.8 -1               59 MB/s   426 MB/s    55593761  73.65
    lz4 r131                  442 MB/s  2284 MB/s    56143775  74.38
    pithy 2011-12-24 -0       336 MB/s  1208 MB/s    56786078  75.23
    lzo1x 2.09 -11            403 MB/s   511 MB/s    57152425  75.72
    wflz 2015-09-16           123 MB/s   702 MB/s    58464473  77.46
    tornado 0.6a -1           201 MB/s   308 MB/s    58877103  78.00
    lzjb 2010                 206 MB/s   413 MB/s    59912482  79.37
    lz4fast r131 -3           574 MB/s  2211 MB/s    61066462  80.90
    lz5hc 1.4.1 -1            449 MB/s  1597 MB/s    63586753  84.24
    lz4fast r131 -17         1506 MB/s  4960 MB/s    73101711  96.85
    blosclz 2015-11-10 -6     196 MB/s  6493 MB/s    75481052 100.00
    blosclz 2015-11-10 -3     527 MB/s  6482 MB/s    75481052 100.00
    blosclz 2015-11-10 -1     985 MB/s  6494 MB/s    75481052 100.00
    Was this supposed to include glza? I don't see it.

    Also, what are your system specs? What CPU instructions are you enabling in the compilation?

  25. #799
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    778
    Thanks
    63
    Thanked 273 Times in 191 Posts
    Quote Originally Posted by Kennon Conrad View Post
    Do you know why the brotli -11 result is different?
    Yes I used an older Brotli and Zstd version, latest released Brotli and Zstd are missing Window compiles.

  26. The Following User Says Thank You to Sportman For This Useful Post:

    Kennon Conrad (13th August 2016)

  27. #800
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    778
    Thanks
    63
    Thanked 273 Times in 191 Posts
    Quote Originally Posted by SolidComp View Post
    Was this supposed to include glza? I don't see it.
    Also, what are your system specs? What CPU instructions are you enabling in the compilation?
    No, it was for size compare to my earlier manual done tests.

    System specs are No 79. listed here http://www.mattmahoney.net/dc/text.html

    I used the Lzbench v1.2 Jun 23 compile from here https://github.com/inikep/lzbench/releases

  28. #801
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    682
    Thanks
    208
    Thanked 249 Times in 152 Posts
    Try brotli with the 16 mb window if others are not limited to 4 mb either.

  29. #802
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts

    GLZA v0.7.1

    Compression ratios are unchanged from GLZA v0.7 and speed should be about the same. The code is structured differently, so that it is easy to integrate into other programs like lzbench.

    Test results (top 25) for my prototype version of lzbench on the Canterbury Corpus and the Large Canterbury Corpus:

    Code:
    Compressor name         Compress. Decompress. Compr. size  Ratio Filename
    glza 0.7.1               0.19 MB/s    40 MB/s       40750  26.79 alice29.txt
    brotli 0.4.0 -11         0.84 MB/s   369 MB/s       46521  30.59 alice29.txt
    lzlib 1.7 -6             4.40 MB/s    58 MB/s       48333  31.78 alice29.txt
    xz 5.2.2 -9              4.04 MB/s    69 MB/s       48448  31.86 alice29.txt
    xz 5.2.2 -6              4.37 MB/s    71 MB/s       48448  31.86 alice29.txt
    lzlib 1.7 -9             4.12 MB/s    59 MB/s       48451  31.86 alice29.txt
    lzma 9.38 -5             3.63 MB/s    81 MB/s       48458  31.86 alice29.txt
    csc 3.3 -5               7.61 MB/s    59 MB/s       49019  32.23 alice29.txt
    zstd 0.8.0 -15           7.50 MB/s   700 MB/s       49701  32.68 alice29.txt
    csc 3.3 -3               8.89 MB/s    59 MB/s       49782  32.73 alice29.txt
    zstd 0.8.0 -22           6.41 MB/s   678 MB/s       49846  32.77 alice29.txt
    zstd 0.8.0 -18           6.44 MB/s   678 MB/s       49846  32.77 alice29.txt
    brotli 0.4.0 -8            15 MB/s   459 MB/s       51190  33.66 alice29.txt
    lzham 1.0 -d26 -1        3.79 MB/s   183 MB/s       51593  33.92 alice29.txt
    lzlib 1.7 -3             7.98 MB/s    55 MB/s       51698  33.99 alice29.txt
    zling 2016-01-10 -4        25 MB/s    79 MB/s       51699  33.99 alice29.txt
    zstd 0.8.0 -11             17 MB/s   707 MB/s       51869  34.10 alice29.txt
    zling 2016-01-10 -3        26 MB/s    79 MB/s       51916  34.14 alice29.txt
    tornado 0.6a -16         6.69 MB/s   188 MB/s       52009  34.20 alice29.txt
    xz 5.2.2 -3                11 MB/s    68 MB/s       52085  34.25 alice29.txt
    xpack 2016-06-02 -9        18 MB/s  1020 MB/s       52221  34.34 alice29.txt
    zling 2016-01-10 -2        28 MB/s    79 MB/s       52362  34.43 alice29.txt
    csc 3.3 -1                 22 MB/s    55 MB/s       52492  34.51 alice29.txt
    zstd 0.8.0 -8              33 MB/s   691 MB/s       52593  34.58 alice29.txt
    brotli 0.4.0 -5            33 MB/s   444 MB/s       52807  34.72 alice29.txt
    Code:
    Compressor name         Compress. Decompress. Compr. size  Ratio Filename
    glza 0.7.1               0.18 MB/s    42 MB/s       38265  30.57 asyoulik.txt
    brotli 0.4.0 -11         0.89 MB/s   318 MB/s       42749  34.15 asyoulik.txt
    lzma 9.38 -5             3.66 MB/s    71 MB/s       44485  35.54 asyoulik.txt
    lzlib 1.7 -9             4.40 MB/s    52 MB/s       44489  35.54 asyoulik.txt
    xz 5.2.2 -6              4.49 MB/s    63 MB/s       44489  35.54 asyoulik.txt
    xz 5.2.2 -9              4.10 MB/s    61 MB/s       44489  35.54 asyoulik.txt
    lzlib 1.7 -6             4.70 MB/s    52 MB/s       44519  35.56 asyoulik.txt
    csc 3.3 -5               8.23 MB/s    51 MB/s       45769  36.56 asyoulik.txt
    zstd 0.8.0 -22           7.02 MB/s   629 MB/s       45816  36.60 asyoulik.txt
    zstd 0.8.0 -18           7.03 MB/s   629 MB/s       45816  36.60 asyoulik.txt
    zstd 0.8.0 -15           7.55 MB/s   625 MB/s       45975  36.73 asyoulik.txt
    csc 3.3 -3               9.15 MB/s    51 MB/s       46093  36.82 asyoulik.txt
    lzlib 1.7 -3             7.16 MB/s    50 MB/s       46288  36.98 asyoulik.txt
    zling 2016-01-10 -4        22 MB/s    68 MB/s       46515  37.16 asyoulik.txt
    zling 2016-01-10 -3        23 MB/s    68 MB/s       46534  37.17 asyoulik.txt
    brotli 0.4.0 -8            15 MB/s   406 MB/s       46723  37.32 asyoulik.txt
    zling 2016-01-10 -1        25 MB/s    68 MB/s       47031  37.57 asyoulik.txt
    zstd 0.8.0 -11             15 MB/s   651 MB/s       47238  37.74 asyoulik.txt
    zling 2016-01-10 -2        24 MB/s    68 MB/s       47261  37.75 asyoulik.txt
    lzham 1.0 -d26 -1        4.00 MB/s   162 MB/s       47326  37.81 asyoulik.txt
    zling 2016-01-10 -0        25 MB/s    67 MB/s       47351  37.83 asyoulik.txt
    xz 5.2.2 -3                10 MB/s    60 MB/s       47437  37.90 asyoulik.txt
    brotli 0.4.0 -5            31 MB/s   396 MB/s       47585  38.01 asyoulik.txt
    zstd 0.8.0 -8              31 MB/s   632 MB/s       47634  38.05 asyoulik.txt
    xpack 2016-06-02 -9        22 MB/s   913 MB/s       47801  38.19 asyoulik.txt
    csc 3.3 -1                 20 MB/s    49 MB/s       47996  38.34 asyoulik.txt
    Code:
    Compressor name         Compress. Decompress. Compr. size  Ratio Filename
    brotli 0.4.0 -11         1.03 MB/s   315 MB/s        6893  28.02 cp.html
    glza 0.7.1               0.79 MB/s    23 MB/s        6966  28.31 cp.html
    xz 5.2.2 -6              3.89 MB/s    61 MB/s        7600  30.89 cp.html
    xz 5.2.2 -9              3.06 MB/s    53 MB/s        7600  30.89 cp.html
    lzlib 1.7 -9             4.64 MB/s    53 MB/s        7610  30.93 cp.html
    lzma 9.38 -5             2.01 MB/s    72 MB/s        7624  30.99 cp.html
    lzlib 1.7 -6             6.03 MB/s    53 MB/s        7649  31.09 cp.html
    brotli 0.4.0 -8          8.37 MB/s   431 MB/s        7675  31.20 cp.html
    brotli 0.4.0 -5            35 MB/s   431 MB/s        7759  31.54 cp.html
    zstd 0.8.0 -22           7.53 MB/s   723 MB/s        7761  31.54 cp.html
    zstd 0.8.0 -18           7.53 MB/s   723 MB/s        7761  31.54 cp.html
    lzlib 1.7 -0             8.12 MB/s    52 MB/s        7789  31.66 cp.html
    zstd 0.8.0 -15             12 MB/s   723 MB/s        7837  31.85 cp.html
    zstd 0.8.0 -11             54 MB/s   793 MB/s        7899  32.11 cp.html
    xz 5.2.2 -3              9.22 MB/s    61 MB/s        7924  32.21 cp.html
    zlib 1.2.8 -9              46 MB/s   384 MB/s        7940  32.27 cp.html
    xpack 2016-06-02 -9        68 MB/s   946 MB/s        7954  32.33 cp.html
    zlib 1.2.8 -6              53 MB/s   384 MB/s        7961  32.36 cp.html
    zstd 0.8.0 -8              79 MB/s   768 MB/s        7965  32.37 cp.html
    lzma 9.38 -4             8.45 MB/s    70 MB/s        7990  32.48 cp.html
    lzma 9.38 -2               22 MB/s    70 MB/s        7991  32.48 cp.html
    xpack 2016-06-02 -6        88 MB/s   946 MB/s        7992  32.48 cp.html
    lzlib 1.7 -3               11 MB/s    51 MB/s        8017  32.59 cp.html
    zstd 0.8.0 -5             132 MB/s   745 MB/s        8085  32.86 cp.html
    lzma 9.38 -0               26 MB/s    68 MB/s        8102  32.93 cp.html
    Code:
    Compressor name         Compress. Decompress. Compr. size  Ratio Filename
    brotli 0.4.0 -11         0.95 MB/s   484 MB/s        2717  24.37 fields.c
    glza 0.7.1               0.03 MB/s    17 MB/s        2812  25.22 fields.c
    lzma 9.38 -5             1.09 MB/s    87 MB/s        2967  26.61 fields.c
    brotli 0.4.0 -8          7.15 MB/s   586 MB/s        2968  26.62 fields.c
    xz 5.2.2 -6              3.72 MB/s    70 MB/s        2981  26.74 fields.c
    xz 5.2.2 -9              2.82 MB/s    51 MB/s        2981  26.74 fields.c
    lzlib 1.7 -9             4.68 MB/s    63 MB/s        2988  26.80 fields.c
    lzlib 1.7 -6             5.80 MB/s    63 MB/s        3000  26.91 fields.c
    brotli 0.4.0 -5            27 MB/s   586 MB/s        3012  27.01 fields.c
    zstd 0.8.0 -18           7.82 MB/s   743 MB/s        3026  27.14 fields.c
    zstd 0.8.0 -15           8.83 MB/s   743 MB/s        3026  27.14 fields.c
    zstd 0.8.0 -22           7.82 MB/s   743 MB/s        3026  27.14 fields.c
    lzlib 1.7 -0             7.66 MB/s    62 MB/s        3033  27.20 fields.c
    zstd 0.8.0 -11             16 MB/s   743 MB/s        3087  27.69 fields.c
    xz 5.2.2 -3              7.75 MB/s    70 MB/s        3111  27.90 fields.c
    zlib 1.2.8 -9              47 MB/s   484 MB/s        3115  27.94 fields.c
    zstd 0.8.0 -8              62 MB/s   796 MB/s        3122  28.00 fields.c
    zlib 1.2.8 -6              56 MB/s   484 MB/s        3122  28.00 fields.c
    lzma 9.38 -4             4.79 MB/s    83 MB/s        3126  28.04 fields.c
    lzma 9.38 -0               27 MB/s    83 MB/s        3126  28.04 fields.c
    lzma 9.38 -2               19 MB/s    83 MB/s        3126  28.04 fields.c
    xpack 2016-06-02 -9        74 MB/s   929 MB/s        3126  28.04 fields.c
    xpack 2016-06-02 -6        90 MB/s   929 MB/s        3149  28.24 fields.c
    lzlib 1.7 -3               11 MB/s    60 MB/s        3162  28.36 fields.c
    zstd 0.8.0 -5             107 MB/s   743 MB/s        3163  28.37 fields.c
    Code:
    Compressor name         Compress. Decompress. Compr. size  Ratio Filename
    brotli 0.4.0 -11         0.85 MB/s   372 MB/s        1126  30.26 grammar.lsp
    glza 0.7.1               0.01 MB/s  9.05 MB/s        1175  31.58 grammar.lsp
    brotli 0.4.0 -8          4.50 MB/s   477 MB/s        1185  31.85 grammar.lsp
    brotli 0.4.0 -5            14 MB/s   472 MB/s        1189  31.95 grammar.lsp
    zstd 0.8.0 -15             10 MB/s   542 MB/s        1221  32.81 grammar.lsp
    zstd 0.8.0 -22           9.47 MB/s   542 MB/s        1222  32.84 grammar.lsp
    zstd 0.8.0 -18           9.47 MB/s   542 MB/s        1222  32.84 grammar.lsp
    zlib 1.2.8 -9              66 MB/s   471 MB/s        1222  32.84 grammar.lsp
    zlib 1.2.8 -6              70 MB/s   474 MB/s        1222  32.84 grammar.lsp
    lzma 9.38 -5             0.41 MB/s    95 MB/s        1234  33.16 grammar.lsp
    zstd 0.8.0 -11             18 MB/s   536 MB/s        1239  33.30 grammar.lsp
    zstd 0.8.0 -8              86 MB/s   579 MB/s        1243  33.40 grammar.lsp
    zstd 0.8.0 -5             143 MB/s   568 MB/s        1246  33.49 grammar.lsp
    xz 5.2.2 -6              2.84 MB/s    67 MB/s        1247  33.51 grammar.lsp
    xz 5.2.2 -9              1.90 MB/s    32 MB/s        1247  33.51 grammar.lsp
    lzlib 1.7 -9             4.86 MB/s    64 MB/s        1259  33.83 grammar.lsp
    lzlib 1.7 -6             5.85 MB/s    64 MB/s        1260  33.86 grammar.lsp
    xpack 2016-06-02 -9        84 MB/s   766 MB/s        1263  33.94 grammar.lsp
    xpack 2016-06-02 -6        90 MB/s   767 MB/s        1263  33.94 grammar.lsp
    lzlib 1.7 -0             6.97 MB/s    63 MB/s        1265  34.00 grammar.lsp
    lzma 9.38 -4             1.80 MB/s    90 MB/s        1271  34.16 grammar.lsp
    lzma 9.38 -2               11 MB/s    90 MB/s        1271  34.16 grammar.lsp
    lzma 9.38 -0               21 MB/s    90 MB/s        1271  34.16 grammar.lsp
    xz 5.2.2 -3              4.68 MB/s    70 MB/s        1284  34.51 grammar.lsp
    lzlib 1.7 -3             9.59 MB/s    63 MB/s        1289  34.64 grammar.lsp
    Code:
    Compressor name         Compress. Decompress. Compr. size  Ratio Filename
    glza 0.7.1               0.47 MB/s   348 MB/s       23099   2.24 kennedy.xls
    lzlib 1.7 -3               23 MB/s   170 MB/s       42537   4.13 kennedy.xls
    xz 5.2.2 -6              3.25 MB/s   195 MB/s       49071   4.77 kennedy.xls
    xz 5.2.2 -9              3.27 MB/s   192 MB/s       49071   4.77 kennedy.xls
    lzlib 1.7 -6             2.94 MB/s   144 MB/s       51229   4.97 kennedy.xls
    lzma 9.38 -5             3.24 MB/s   222 MB/s       51388   4.99 kennedy.xls
    lzlib 1.7 -9             1.73 MB/s   143 MB/s       51576   5.01 kennedy.xls
    lzlib 1.7 -0               71 MB/s   163 MB/s       56620   5.50 kennedy.xls
    csc 3.3 -3               8.55 MB/s   221 MB/s       57135   5.55 kennedy.xls
    csc 3.3 -5               6.64 MB/s   219 MB/s       60279   5.85 kennedy.xls
    tornado 0.6a -16         5.57 MB/s   257 MB/s       62140   6.03 kennedy.xls
    brotli 0.4.0 -11         0.46 MB/s   618 MB/s       62192   6.04 kennedy.xls
    lzham 1.0 -d26 -1        5.66 MB/s   336 MB/s       62201   6.04 kennedy.xls
    tornado 0.6a -13         9.28 MB/s   256 MB/s       62775   6.10 kennedy.xls
    lzma 9.38 -4               32 MB/s   260 MB/s       66394   6.45 kennedy.xls
    lzma 9.38 -2               39 MB/s   260 MB/s       66432   6.45 kennedy.xls
    xz 5.2.2 -3                16 MB/s   229 MB/s       67536   6.56 kennedy.xls
    brotli 0.4.0 -8            21 MB/s   736 MB/s       68136   6.62 kennedy.xls
    xz 5.2.2 -0                66 MB/s   245 MB/s       68663   6.67 kennedy.xls
    lzma 9.38 -0               59 MB/s   255 MB/s       69143   6.71 kennedy.xls
    zstd 0.8.0 -22           1.86 MB/s   545 MB/s       70226   6.82 kennedy.xls
    brotli 0.4.0 -5            61 MB/s   713 MB/s       71510   6.94 kennedy.xls
    csc 3.3 -1                 42 MB/s   192 MB/s       72935   7.08 kennedy.xls
    xpack 2016-06-02 -9        17 MB/s  1180 MB/s       77766   7.55 kennedy.xls
    xpack 2016-06-02 -6        42 MB/s  1162 MB/s       78243   7.60 kennedy.xls
    Code:
    Compressor name         Compress. Decompress. Compr. size  Ratio Filename
    glza 0.7.1               0.11 MB/s    49 MB/s       97262  22.79 lcet10.txt
    brotli 0.4.0 -11         0.77 MB/s   430 MB/s      113475  26.59 lcet10.txt
    csc 3.3 -5               7.43 MB/s    70 MB/s      118720  27.82 lcet10.txt
    lzlib 1.7 -6             4.13 MB/s    67 MB/s      119134  27.92 lcet10.txt
    lzlib 1.7 -9             3.76 MB/s    68 MB/s      119267  27.95 lcet10.txt
    xz 5.2.2 -6              4.29 MB/s    84 MB/s      119446  27.99 lcet10.txt
    xz 5.2.2 -9              4.09 MB/s    83 MB/s      119446  27.99 lcet10.txt
    lzma 9.38 -5             3.95 MB/s    94 MB/s      119519  28.01 lcet10.txt
    csc 3.3 -3               9.17 MB/s    67 MB/s      120628  28.27 lcet10.txt
    zstd 0.8.0 -22           5.37 MB/s   777 MB/s      121995  28.59 lcet10.txt
    zstd 0.8.0 -18           7.49 MB/s   811 MB/s      122632  28.74 lcet10.txt
    tornado 0.6a -16         5.55 MB/s   218 MB/s      124713  29.22 lcet10.txt
    zling 2016-01-10 -4        40 MB/s   137 MB/s      126706  29.69 lcet10.txt
    zstd 0.8.0 -15             13 MB/s   827 MB/s      127506  29.88 lcet10.txt
    zling 2016-01-10 -3        43 MB/s   137 MB/s      127565  29.89 lcet10.txt
    lzham 1.0 -d26 -1        3.62 MB/s   228 MB/s      127859  29.96 lcet10.txt
    brotli 0.4.0 -8            17 MB/s   536 MB/s      128084  30.01 lcet10.txt
    tornado 0.6a -13         7.56 MB/s   212 MB/s      128203  30.04 lcet10.txt
    csc 3.3 -1                 25 MB/s    64 MB/s      128890  30.20 lcet10.txt
    zstd 0.8.0 -11             33 MB/s   808 MB/s      129899  30.44 lcet10.txt
    zling 2016-01-10 -2        48 MB/s   135 MB/s      130036  30.47 lcet10.txt
    lzlib 1.7 -3             8.56 MB/s    62 MB/s      130566  30.60 lcet10.txt
    xz 5.2.2 -3                10 MB/s    79 MB/s      130823  30.66 lcet10.txt
    xpack 2016-06-02 -9        11 MB/s  1131 MB/s      131026  30.70 lcet10.txt
    zstd 0.8.0 -8              47 MB/s   790 MB/s      131246  30.75 lcet10.txt
    Code:
    Compressor name         Compress. Decompress. Compr. size  Ratio Filename
    glza 0.7.1               0.17 MB/s    51 MB/s      136857  28.40 plrabn12.txt
    brotli 0.4.0 -11         0.81 MB/s   375 MB/s      163282  33.89 plrabn12.txt
    lzlib 1.7 -9             3.76 MB/s    59 MB/s      165247  34.29 plrabn12.txt
    lzma 9.38 -5             3.72 MB/s    82 MB/s      165311  34.31 plrabn12.txt
    xz 5.2.2 -6              4.01 MB/s    72 MB/s      165346  34.31 plrabn12.txt
    xz 5.2.2 -9              3.88 MB/s    72 MB/s      165346  34.31 plrabn12.txt
    lzlib 1.7 -6             3.87 MB/s    58 MB/s      165400  34.33 plrabn12.txt
    csc 3.3 -5               7.68 MB/s    55 MB/s      165487  34.34 plrabn12.txt
    csc 3.3 -3               8.48 MB/s    55 MB/s      166767  34.61 plrabn12.txt
    zstd 0.8.0 -18           7.10 MB/s   685 MB/s      168550  34.98 plrabn12.txt
    zstd 0.8.0 -22           5.42 MB/s   672 MB/s      168627  34.99 plrabn12.txt
    tornado 0.6a -16         5.68 MB/s   188 MB/s      171140  35.52 plrabn12.txt
    zling 2016-01-10 -4        34 MB/s   126 MB/s      173903  36.09 plrabn12.txt
    zling 2016-01-10 -3        38 MB/s   125 MB/s      174762  36.27 plrabn12.txt
    tornado 0.6a -13         7.92 MB/s   181 MB/s      175832  36.49 plrabn12.txt
    lzlib 1.7 -3             6.59 MB/s    54 MB/s      176016  36.53 plrabn12.txt
    zstd 0.8.0 -15             12 MB/s   705 MB/s      176419  36.61 plrabn12.txt
    zling 2016-01-10 -2        44 MB/s   125 MB/s      176467  36.62 plrabn12.txt
    lzham 1.0 -d26 -1        3.42 MB/s   206 MB/s      176829  36.70 plrabn12.txt
    zling 2016-01-10 -1        47 MB/s   124 MB/s      177283  36.79 plrabn12.txt
    csc 3.3 -1                 21 MB/s    52 MB/s      178138  36.97 plrabn12.txt
    brotli 0.4.0 -8            14 MB/s   453 MB/s      178173  36.98 plrabn12.txt
    zstd 0.8.0 -11             26 MB/s   686 MB/s      178997  37.15 plrabn12.txt
    zling 2016-01-10 -0        49 MB/s   122 MB/s      179345  37.22 plrabn12.txt
    Code:
    Compressor name         Compress. Decompress. Compr. size  Ratio File
    lzlib 1.7 -9             1.70 MB/s   154 MB/s       39587   7.71 ptt5
    brotli 0.4.0 -11         0.55 MB/s   747 MB/s       40987   7.99 ptt5
    xz 5.2.2 -6              5.94 MB/s   209 MB/s       41945   8.17 ptt5
    xz 5.2.2 -9              5.64 MB/s   204 MB/s       41945   8.17 ptt5
    lzlib 1.7 -6             8.60 MB/s   147 MB/s       43341   8.44 ptt5
    lzma 9.38 -5             8.65 MB/s   242 MB/s       43495   8.47 ptt5
    zstd 0.8.0 -22           2.76 MB/s  1763 MB/s       43800   8.53 ptt5
    csc 3.3 -5               9.56 MB/s   201 MB/s       44506   8.67 ptt5
    lzham 1.0 -d26 -1        6.79 MB/s   444 MB/s       46023   8.97 ptt5
    csc 3.3 -3                 33 MB/s   197 MB/s       46486   9.06 ptt5
    lzham 1.0 -d26 -0          15 MB/s   429 MB/s       47189   9.19 ptt5
    lzlib 1.7 -3               20 MB/s   140 MB/s       47221   9.20 ptt5
    glza 0.7.1               0.22 MB/s   103 MB/s       47337   9.22 ptt5
    csc 3.3 -1                 57 MB/s   195 MB/s       47514   9.26 ptt5
    lzma 9.38 -0               70 MB/s   223 MB/s       47570   9.27 ptt5
    lzlib 1.7 -0               74 MB/s   138 MB/s       47693   9.29 ptt5
    xz 5.2.2 -0                68 MB/s   199 MB/s       48249   9.40 ptt5
    xz 5.2.2 -3                37 MB/s   197 MB/s       48356   9.42 ptt5
    brotli 0.4.0 -8            41 MB/s   833 MB/s       48415   9.43 ptt5
    brotli 0.4.0 -5            91 MB/s   823 MB/s       48417   9.43 ptt5
    tornado 0.6a -16         3.49 MB/s   445 MB/s       48650   9.48 ptt5
    zstd 0.8.0 -15             32 MB/s  1973 MB/s       49530   9.65 ptt5
    zstd 0.8.0 -18             24 MB/s  1819 MB/s       49734   9.69 ptt5
    zstd 0.8.0 -8             162 MB/s  1846 MB/s       49863   9.72 ptt5
    zstd 0.8.0 -11            130 MB/s  1859 MB/s       49924   9.73 ptt5
    Code:
    Compressor name         Compress. Decompress. Compr. size  Ratio Filename
    xz 5.2.2 -9              3.45 MB/s    62 MB/s        9406  24.60 sum
    xz 5.2.2 -6              4.35 MB/s    68 MB/s        9406  24.60 sum
    lzlib 1.7 -9             4.24 MB/s    61 MB/s        9414  24.62 sum
    lzma 9.38 -5             2.56 MB/s    81 MB/s        9422  24.64 sum
    lzlib 1.7 -6             5.66 MB/s    60 MB/s        9447  24.70 sum
    lzlib 1.7 -0             6.21 MB/s    60 MB/s        9463  24.75 sum
    lzlib 1.7 -3             9.36 MB/s    59 MB/s       10068  26.33 sum
    xz 5.2.2 -0                26 MB/s    67 MB/s       10159  26.57 sum
    brotli 0.4.0 -11         0.86 MB/s   283 MB/s       10198  26.67 sum
    lzma 9.38 -2               24 MB/s    79 MB/s       10229  26.75 sum
    lzma 9.38 -4               11 MB/s    79 MB/s       10229  26.75 sum
    xz 5.2.2 -3                11 MB/s    66 MB/s       10237  26.77 sum
    lzma 9.38 -0               27 MB/s    78 MB/s       10282  26.89 sum
    csc 3.3 -5               7.95 MB/s    69 MB/s       10459  27.35 sum
    csc 3.3 -3                 10 MB/s    69 MB/s       10553  27.60 sum
    lzham 1.0 -d26 -1        4.99 MB/s   113 MB/s       10948  28.63 sum
    csc 3.3 -1                 25 MB/s    67 MB/s       11144  29.14 sum
    zstd 0.8.0 -22           7.52 MB/s   616 MB/s       11170  29.21 sum
    zstd 0.8.0 -18           8.45 MB/s   616 MB/s       11172  29.22 sum
    zstd 0.8.0 -15             10 MB/s   616 MB/s       11193  29.27 sum
    brotli 0.4.0 -8            10 MB/s   354 MB/s       11535  30.16 sum
    brotli 0.4.0 -5            36 MB/s   354 MB/s       11556  30.22 sum
    lzham 1.0 -d26 -0        7.27 MB/s   108 MB/s       11638  30.43 sum
    xpack 2016-06-02 -9        70 MB/s   780 MB/s       11670  30.52 sum
    xpack 2016-06-02 -6        89 MB/s   796 MB/s       11691  30.57 sum
    (#26) glza 0.7.1               0.16 MB/s    24 MB/s       11896  31.11 sum
    Code:
    Compressor name         Compress. Decompress. Compr. size  Ratio Filename
    brotli 0.4.0 -11         0.89 MB/s   325 MB/s        1463  34.61 xargs.1
    glza 0.7.1               0.01 MB/s  9.69 MB/s        1592  37.66 xargs.1
    brotli 0.4.0 -8          3.82 MB/s   422 MB/s        1649  39.01 xargs.1
    brotli 0.4.0 -5            14 MB/s   422 MB/s        1655  39.15 xargs.1
    zlib 1.2.8 -6              64 MB/s   431 MB/s        1736  41.07 xargs.1
    zlib 1.2.8 -9              63 MB/s   428 MB/s        1736  41.07 xargs.1
    zstd 0.8.0 -22             11 MB/s   494 MB/s        1739  41.14 xargs.1
    zstd 0.8.0 -18             11 MB/s   494 MB/s        1739  41.14 xargs.1
    zstd 0.8.0 -15             11 MB/s   494 MB/s        1739  41.14 xargs.1
    zstd 0.8.0 -11             17 MB/s   499 MB/s        1745  41.28 xargs.1
    zstd 0.8.0 -8              91 MB/s   523 MB/s        1746  41.31 xargs.1
    zstd 0.8.0 -5             117 MB/s   516 MB/s        1749  41.38 xargs.1
    lzma 9.38 -5             0.47 MB/s    65 MB/s        1752  41.45 xargs.1
    xpack 2016-06-02 -9        82 MB/s   676 MB/s        1765  41.76 xargs.1
    xz 5.2.2 -6              2.58 MB/s    50 MB/s        1766  41.78 xargs.1
    xz 5.2.2 -9              1.81 MB/s    28 MB/s        1766  41.78 xargs.1
    xpack 2016-06-02 -6        82 MB/s   676 MB/s        1767  41.80 xargs.1
    lzlib 1.7 -9             5.14 MB/s    45 MB/s        1779  42.09 xargs.1
    lzlib 1.7 -6             5.74 MB/s    45 MB/s        1782  42.16 xargs.1
    lzlib 1.7 -0             6.53 MB/s    45 MB/s        1789  42.32 xargs.1
    lzma 9.38 -4             1.98 MB/s    64 MB/s        1799  42.56 xargs.1
    lzma 9.38 -2               11 MB/s    64 MB/s        1799  42.56 xargs.1
    lzma 9.38 -0               18 MB/s    64 MB/s        1799  42.56 xargs.1
    xz 5.2.2 -3              4.04 MB/s    51 MB/s        1811  42.84 xargs.1
    lzlib 1.7 -3             8.40 MB/s    45 MB/s        1811  42.84 xargs.1
    Code:
    Compressor name         Compress. Decompress. Compr. size  Ratio Filename
    glza 0.7.1               0.33 MB/s    72 MB/s      716442  17.70 bible.txt
    csc 3.3 -5               5.07 MB/s    87 MB/s      851882  21.05 bible.txt
    lzlib 1.7 -9             2.36 MB/s    87 MB/s      884235  21.85 bible.txt
    xz 5.2.2 -6              2.88 MB/s   115 MB/s      885002  21.87 bible.txt
    xz 5.2.2 -9              2.86 MB/s   115 MB/s      885002  21.87 bible.txt
    lzlib 1.7 -6             2.65 MB/s    87 MB/s      886785  21.91 bible.txt
    lzma 9.38 -5             2.79 MB/s   127 MB/s      888300  21.95 bible.txt
    brotli 0.4.0 -11         0.67 MB/s   585 MB/s      890810  22.01 bible.txt
    zstd 0.8.0 -22           3.25 MB/s   933 MB/s      894717  22.11 bible.txt
    tornado 0.6a -16         3.15 MB/s   294 MB/s      898872  22.21 bible.txt
    csc 3.3 -3               8.31 MB/s    80 MB/s      901173  22.27 bible.txt
    zstd 0.8.0 -18           4.08 MB/s   927 MB/s      905974  22.38 bible.txt
    tornado 0.6a -13         6.22 MB/s   266 MB/s      966485  23.88 bible.txt
    zstd 0.8.0 -15           7.19 MB/s   947 MB/s      972617  24.03 bible.txt
    lzham 1.0 -d26 -1        3.20 MB/s   312 MB/s      984755  24.33 bible.txt
    csc 3.3 -1                 28 MB/s    78 MB/s      989332  24.44 bible.txt
    tornado 0.6a -10         9.90 MB/s   283 MB/s      989367  24.44 bible.txt
    zling 2016-01-10 -4        50 MB/s   226 MB/s     1000857  24.73 bible.txt
    zling 2016-01-10 -3        62 MB/s   224 MB/s     1015300  25.09 bible.txt
    brotli 0.4.0 -8            12 MB/s   608 MB/s     1017025  25.13 bible.txt
    zstd 0.8.0 -11             24 MB/s   879 MB/s     1031151  25.48 bible.txt
    tornado 0.6a -7            27 MB/s   267 MB/s     1031847  25.49 bible.txt
    zling 2016-01-10 -2        78 MB/s   224 MB/s     1040712  25.71 bible.txt
    crush 1.0 -2             0.45 MB/s   454 MB/s     1045206  25.82 bible.txt
    xz 5.2.2 -3              8.21 MB/s    94 MB/s     1056901  26.11 bible.txt
    Code:
    Compressor name         Compress. Decompress. Compr. size  Ratio Filename
    glza 0.7.1               0.12 MB/s    24 MB/s     1137825  24.53 E.coli
    brotli 0.4.0 -11         0.44 MB/s   384 MB/s     1138119  24.54 E.coli
    tornado 0.6a -16         2.12 MB/s   241 MB/s     1181243  25.47 E.coli
    lzma 9.38 -5             1.71 MB/s   129 MB/s     1185115  25.55 E.coli
    xz 5.2.2 -9              1.80 MB/s   120 MB/s     1186258  25.57 E.coli
    xz 5.2.2 -6              1.80 MB/s   121 MB/s     1186258  25.57 E.coli
    lzlib 1.7 -6             1.64 MB/s    82 MB/s     1186445  25.58 E.coli
    lzlib 1.7 -9             1.62 MB/s    82 MB/s     1187207  25.59 E.coli
    zstd 0.8.0 -22           2.08 MB/s   790 MB/s     1198623  25.84 E.coli
    zstd 0.8.0 -18           2.20 MB/s   774 MB/s     1198895  25.85 E.coli
    csc 3.3 -5               1.90 MB/s   118 MB/s     1208731  26.06 E.coli
    tornado 0.6a -13         7.96 MB/s   203 MB/s     1224769  26.40 E.coli
    lzham 1.0 -d26 -1        3.48 MB/s   241 MB/s     1243583  26.81 E.coli
    zstd 0.8.0 -15           3.95 MB/s   827 MB/s     1269985  27.38 E.coli
    csc 3.3 -3               5.54 MB/s    87 MB/s     1273792  27.46 E.coli
    tornado 0.6a -10         9.06 MB/s   254 MB/s     1277805  27.55 E.coli
    lzlib 1.7 -3             7.26 MB/s    83 MB/s     1283007  27.66 E.coli
    zlib 1.2.8 -9            1.34 MB/s   479 MB/s     1299717  28.02 E.coli
    brotli 0.4.0 -8          5.38 MB/s   503 MB/s     1317226  28.40 E.coli
    xpack 2016-06-02 -9      4.84 MB/s  1183 MB/s     1325115  28.57 E.coli
    tornado 0.6a -7            29 MB/s   207 MB/s     1325226  28.57 E.coli
    zstd 0.8.0 -11             23 MB/s   645 MB/s     1333833  28.75 E.coli
    xz 5.2.2 -3              9.05 MB/s    76 MB/s     1335356  28.79 E.coli
    zlib 1.2.8 -6            6.68 MB/s   449 MB/s     1341990  28.93 E.coli
    zstd 0.8.0 -8              39 MB/s   604 MB/s     1349606  29.09 E.coli
    Code:
    Compressor name         Compress. Decompress. Compr. size  Ratio Filename
    glza 0.7.1               0.12 MB/s    66 MB/s      414039  16.74 world192.txt
    brotli 0.4.0 -11         0.69 MB/s   520 MB/s      475200  19.21 world192.txt
    csc 3.3 -5               5.18 MB/s    89 MB/s      482532  19.51 world192.txt
    lzlib 1.7 -9             2.82 MB/s    87 MB/s      483798  19.56 world192.txt
    xz 5.2.2 -9              3.69 MB/s   113 MB/s      487374  19.70 world192.txt
    xz 5.2.2 -6              3.83 MB/s   113 MB/s      487374  19.70 world192.txt
    lzlib 1.7 -6             3.92 MB/s    86 MB/s      496279  20.06 world192.txt
    lzma 9.38 -5             4.24 MB/s   125 MB/s      499271  20.19 world192.txt
    zstd 0.8.0 -22           3.55 MB/s  1028 MB/s      506441  20.48 world192.txt
    tornado 0.6a -16         3.48 MB/s   302 MB/s      507530  20.52 world192.txt
    zstd 0.8.0 -18           6.00 MB/s  1053 MB/s      522618  21.13 world192.txt
    zstd 0.8.0 -15           8.63 MB/s  1069 MB/s      532074  21.51 world192.txt
    tornado 0.6a -13         6.78 MB/s   285 MB/s      534502  21.61 world192.txt
    csc 3.3 -3                 11 MB/s    82 MB/s      535400  21.65 world192.txt
    zling 2016-01-10 -4        64 MB/s   236 MB/s      538842  21.79 world192.txt
    tornado 0.6a -10           10 MB/s   296 MB/s      541285  21.88 world192.txt
    brotli 0.4.0 -8            18 MB/s   623 MB/s      541411  21.89 world192.txt
    lzham 1.0 -d26 -1        3.48 MB/s   320 MB/s      545944  22.07 world192.txt
    zling 2016-01-10 -3        71 MB/s   234 MB/s      546018  22.08 world192.txt
    xz 5.2.2 -3                12 MB/s   106 MB/s      550228  22.25 world192.txt
    zling 2016-01-10 -2        82 MB/s   231 MB/s      556779  22.51 world192.txt
    zstd 0.8.0 -11             31 MB/s  1009 MB/s      558678  22.59 world192.txt
    tornado 0.6a -7            27 MB/s   281 MB/s      561201  22.69 world192.txt
    zling 2016-01-10 -1        89 MB/s   227 MB/s      569705  23.03 world192.txt
    crush 1.0 -2             1.81 MB/s   490 MB/s      572170  23.13 world192.txt
    GLZA has the best compression ratio on 8 of the 14 files but is the slowest in almost all cases. Brotli has the best compression ratio on 4 of the 14 files and is fast, it's really pretty impressive. It would be nice if Kraken were available to see how it compares.
    Attached Files Attached Files
    Last edited by Kennon Conrad; 15th August 2016 at 08:33.

  30. The Following 5 Users Say Thank You to Kennon Conrad For This Useful Post:

    Bulat Ziganshin (14th August 2016),Jyrki Alakuijala (16th August 2016),Mike (14th August 2016),pixar (20th August 2016),surfersat (28th September 2016)

  31. #803
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    233
    Thanks
    92
    Thanked 47 Times in 31 Posts
    Kennon, have you tried Zstd with a dictionary? You can train a dictionary once then use it forever. It should be trained to relevant content, even training it on the target file itself should work. Brotli has a built in text/web dictionary, so for that kind of content, the best comparison would be Zstd with a similar dictionary.

  32. #804
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    Quote Originally Posted by SolidComp View Post
    Kennon, have you tried Zstd with a dictionary? You can train a dictionary once then use it forever. It should be trained to relevant content, even training it on the target file itself should work. Brotli has a built in text/web dictionary, so for that kind of content, the best comparison would be Zstd with a similar dictionary.
    No, I have never tried Zstd at all outside of my recent work on lzbench integration. I have noticed it has impressive performance and follow the thread but that's all. You are right, it's not really fair to compare compressors with options set differently but Brotli gets the advantage because it does that by default. Since most compressors don't have a built in dictionary, it seems like it might be best to turn Brotli's off. I have tried turning off the dictionary in brotli when testing separately but never figured out how to do it. It might be good if lzbench turned off the dictionary too, if possible.

    It seems like a dictionary could help GLZA too, maybe more than for Brotli since it's more dictionary based. There is usually a lot of commonality between the dictionaries produced by GLZA for different files. There's probably no reason GLZA couldn't start with a basic dictionary of the (really) common words/structures/phrases and go from there. Also, a dictionary could be used to train the o(1) trailing/leading character model, which would help a bit, most noticably on small files.

  33. #805
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    233
    Thanks
    92
    Thanked 47 Times in 31 Posts
    Quote Originally Posted by Kennon Conrad View Post
    It seems like a dictionary could help GLZA too, maybe more than for Brotli since it's more dictionary based. There is usually a lot of commonality between the dictionaries produced by GLZA for different files. There's probably no reason GLZA couldn't start with a basic dictionary of the (really) common words/structures/phrases and go from there. Also, a dictionary could be used to train the o(1) trailing/leading character model, which would help a bit, most noticably on small files.
    For web compression (HTML, CSS, JS), we really ought to have a simple dictionary for the tags, keywords, and URL strings – I'm still surprised that the web, as important as it is, doesn't have any sort of tailored compression. I'd be very curious to see how GLZA does with dictionaries, since it's already so good. I also think some sort of interpolated string dictionary would be interesting. We tend to focus on continuous strings, but multi-part punctuated or interpolated strings have a lot of potential. In the simplest case, we have start tags and end tags in XML/HTML that we ought to encode as a unit, and we have lots of URLs that start with the same https:// sequence and end with the same .com or .jpg strings, differing only in the middle section.

    By the way, do you have updated RAM memory usage figures for GLZA compression and decompression?

  34. #806
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    I am sorry for the slow reply, I wasn't feeling well the last few days.

    Quote Originally Posted by SolidComp View Post
    For web compression (HTML, CSS, JS), we really ought to have a simple dictionary for the tags, keywords, and URL strings – I'm still surprised that the web, as important as it is, doesn't have any sort of tailored compression.
    Have you considered XWRT? I have never been able to get better compression ratios using it as a preprocessor for GLZA on web pages but it does seems to help several other compressors and I think it is at least somewhat tailored to the web, at least for tags.

    Quote Originally Posted by SolidComp View Post
    I'd be very curious to see how GLZA does with dictionaries, since it's already so good.
    Me too! Unfortunately, I think it's a little more complicated than with other LZ style programs such as Brotli. It seems that with LZ77 you could just put a dictionary in the history at the start and use that to send matches right from the start. You could do something similar with GLZ but it's a little different because it references grammar rules instead of offsets. Normally if a string like "<page>" was used at the start of a file, it would not be likely to be sent as just one rule S1 -> <page>, but instead it might be sent S1 -> <S2>, S2 -> pS3, S3 -> age, so that "age" and "page" also go into the dictionary. So I need to decide how to approach this deviation from the normal case. If I pull <page> out of a prebuilt history, I need to decide if GLZA would create the other dictionary entries at that time by adding additional codes to indicate which substrings should go into the dictionary, or whether putting substrings into the dictionary would be deferred until later when (and if) those substrings appear for the first time and are not coming from the dictionary. Honestly, I'm not sure which is the "right" way to go.

    Quote Originally Posted by SolidComp View Post
    I also think some sort of interpolated string dictionary would be interesting. We tend to focus on continuous strings, but multi-part punctuated or interpolated strings have a lot of potential. In the simplest case, we have start tags and end tags in XML/HTML that we ought to encode as a unit, and we have lots of URLs that start with the same https:// sequence and end with the same .com or .jpg strings, differing only in the middle section.
    Yes, I completely agree. You are getting into some areas that Paul and I have discussed a bit. For tags, it seems like production rules such as Stag -> <tag>S</tag> could be very useful if supported and used properly. I think it makes the grammar into one that is not straight-line, but don't see why that would be a problem. Similary, production rules like Shttps-com -> https://S.com could be used for some URLs. Since GLZ uses production rules, this seems like a natural fit. I have wanted to get to a good baseline before getting serious about adding support for this (it's not trivial, at least for me!) but it seems totally reasonable to add, as long as it is applied properly (probably not a good idea to use Spage -> <page>S</page> on enwik's because it's probably more effective to just deduplicate "</page><page>"). This brings me back to the dictionary idea and the thought that maybe it's not just a preloaded "dictionary" that's best for GLZA, but also preloaded "skip" production rules and a mechanism to create them on the fly from just the "tag".

    Quote Originally Posted by SolidComp View Post
    By the way, do you have updated RAM memory usage figures for GLZA compression and decompression?
    For enwik9 or which files? I don't have exact numbers but can get them. In general, both compression and decompression memory usage is as high as ever for large files. For compression, that's because I bumped up memory use (can impact window size) and recently made the memory use option unavailable (I will put it back in fairly soon if people care). For decompression it is because lzbench (etc.) expect a buffer with the entire file so I took out some of the buffer management code that allowed the buffer to be much smaller. I will put this back in fairly soon for standalone mode because if the data is being written to disk it doesn't need the history.

    I am curious, which is important to you?

  35. #807
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    233
    Thanks
    92
    Thanked 47 Times in 31 Posts
    Quote Originally Posted by Kennon Conrad View Post
    I am sorry for the slow reply, I wasn't feeling well the last few days.



    Have you considered XWRT? I have never been able to get better compression ratios using it as a preprocessor for GLZA on web pages but it does seems to help several other compressors and I think it is at least somewhat tailored to the web, at least for tags.
    Yes, I like XWRT a lot. However it's not supported by browsers, so I can't use it. I was referring to the fact that there are no web-specific compression solutions that are supported by browsers. It's really strange that we only have generic deflate.


    Yes, I completely agree. You are getting into some areas that Paul and I have discussed a bit. For tags, it seems like production rules such as Stag -> <tag>S</tag> could be very useful if supported and used properly. I think it makes the grammar into one that is not straight-line, but don't see why that would be a problem. Similary, production rules like Shttps-com -> https://S.com could be used for some URLs. Since GLZ uses production rules, this seems like a natural fit. I have wanted to get to a good baseline before getting serious about adding support for this (it's not trivial, at least for me!) but it seems totally reasonable to add, as long as it is applied properly (probably not a good idea to use Spage -> <page>S</page> on enwik's because it's probably more effective to just deduplicate "</page><page>"). This brings me back to the dictionary idea and the thought that maybe it's not just a preloaded "dictionary" that's best for GLZA, but also preloaded "skip" production rules and a mechanism to create them on the fly from just the "tag".
    Yes, that's exactly the approach I'm thinking of. It might help to combine it with normalization of HTML, CSS, and JS files. They're such a mess, and the specs are terrible, with bizarre statements allowing zero or more spaces in arbitrary places for no reason. That's not how specs are normally written. But one could normalize and minify these files in ways that make compression easier, faster, better, etc. And we could be smarter about string matching and knowing when to not bother, like when we scan the enormous URLs created by some servers on the fly (like Google's PageSpeed module). If we scan a URL like this one:

    (You might have to hover over it – the site is truncating it in the code box). We should know that there's not going to be a significant string match beyond ten or so characters into the path (unless we configure our apps to generate nearly identical URLs when they do this, which I don't think anyone does yet).

    For enwik9 or which files? I don't have exact numbers but can get them. In general, both compression and decompression memory usage is as high as ever for large files. For compression, that's because I bumped up memory use (can impact window size) and recently made the memory use option unavailable (I will put it back in fairly soon if people care). For decompression it is because lzbench (etc.) expect a buffer with the entire file so I took out some of the buffer management code that allowed the buffer to be much smaller. I will put this back in fairly soon for standalone mode because if the data is being written to disk it doesn't need the history.

    I am curious, which is important to you?
    Decompression memory and CPU use are most important to me. I asked because I had seen some discussion of these topics from a few months ago, I think, and it sounded like there were going to be code changes aimed at improving memory use, but I might be remembering it wrong. In general, I'm becoming fairly strident in my position that I can't do anything with compression benchmarks that don't report memory and CPU use for both compression and decompression. I think we get too distracted by the glitter of compression ratios and speed, but a lot of these "winning" codecs use enormous memory and CPU. If something uses 1 GiB of RAM and 100% of a flagship Android phone's CPU to decompress then it's only usable on powerful desktops. I'm most interested in codecs that can replace gzip on the web, in mobile devices and such, so the target content would be much smaller than enwik9 and diverse – HTML, CSS, and JS files, maybe JSON and CSV data (there's an upcoming CSV in HTML standard).

  36. #808
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    Quote Originally Posted by SolidComp View Post
    Yes, I like XWRT a lot. However it's not supported by browsers, so I can't use it. I was referring to the fact that there are no web-specific compression solutions that are supported by browsers. It's really strange that we only have generic deflate.
    Thanks for the clarification. This does seem strange. I wonder if it's related to deflates low memory usage.

    Quote Originally Posted by SolidComp View Post
    Yes, that's exactly the approach I'm thinking of. It might help to combine it with normalization of HTML, CSS, and JS files. They're such a mess, and the specs are terrible, with bizarre statements allowing zero or more spaces in arbitrary places for no reason. That's not how specs are normally written. But one could normalize and minify these files in ways that make compression easier, faster, better, etc. And we could be smarter about string matching and knowing when to not bother, like when we scan the enormous URLs created by some servers on the fly (like Google's PageSpeed module).
    So they have specs? I have been looking and not finding anything useful but maybe need to look harder. I don't know much about parsers but from a little reading it sounds like something like an LL parser should be used.

    You may already know this but GLZA's string matching is a lot different from LZ77 or deflate. Instead of searching for the most efficient local transmission of matches/literals the algorithm recursively searches for the best global matches and creates rules for those. So instead of knowing not to bother with looking for matches in a section of data, for GLZA you would want to not include that section of data in the suffix tree that is built for global match finding. I think this wouldn't be much time saving if the data doesn't have long matches because the code doesn't have to traverse very far in the tree before it reaches a leaf. I think what's really important is being able to efficiently find the strings that provide the most immediate compression while minimizing the loss in "future" compressibility of the file. So smart string finding via parsing and HTML token recognition is of interest to me because it may improve speed (significantly?) and compression ratio (slightly?), plus it seems like there may be some synergies with tag finding, so I'd like to find some decent specs on the structures to get a better idea of what smarter string matching can mean.

    Quote Originally Posted by SolidComp View Post
    Decompression memory and CPU use are most important to me. I asked because I had seen some discussion of these topics from a few months ago, I think, and it sounded like there were going to be code changes aimed at improving memory use, but I might be remembering it wrong.
    There were code changes a while back that decreased memory usage for decoding. v0.5 - v0.7 are best, generally Pareto Frontier decoding memory for large texty files and a few MB for small files, which can probably be improved. Starting with v0.7.1, I changed to decoder to support a buffer with all the decoded data to be compatible with lzbench so memory usage is not as good and virtual memory use went way up until I find time to add some code. I need to decide whether to put the standalone version back to the way it was or not. When I started I only thought about decompression to disk, but I was novice at compression. Now I think decompression to RAM may be more important (or at least a better fit for what GLZA does well) and am thinking it might be better to live with having decompression require at least as much memory as the decompressed data takes and have the dictionary pointers point to history rather than a separate dictionary.

    Quote Originally Posted by SolidComp View Post
    I think we get too distracted by the glitter of compression ratios and speed, but a lot of these "winning" codecs use enormous memory and CPU. If something uses 1 GiB of RAM and 100% of a flagship Android phone's CPU to decompress then it's only usable on powerful desktops. I'm most interested in codecs that can replace gzip on the web, in mobile devices and such, so the target content would be much smaller than enwik9 and diverse – HTML, CSS, and JS files, maybe JSON and CSV data (there's an upcoming CSV in HTML standard).
    Just to be clear, GLZA does use lots of memory and CPU for compression compared to most or all LZxx compressors. Decompression characteristics are much better; if there's a problem on an Android, then it's something I did wrong in the coding. In the medium to long run, I could see where GLZ could be very useful for decompressing web data. In the short term, it's probably a little immature. Compression ratios are likely to improve at least a little over time, speed can be improved, code can be cleaner, etc.

  37. #809
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    233
    Thanks
    92
    Thanked 47 Times in 31 Posts
    Quote Originally Posted by Kennon Conrad View Post
    Thanks for the clarification. This does seem strange. I wonder if it's related to deflates low memory usage.

    So they have specs? I have been looking and not finding anything useful but maybe need to look harder. I don't know much about parsers but from a little reading it sounds like something like an LL parser should be used.
    Yeah, there are specs for HTML, CSS, and JS, though for HTML there are two different specs from different organizations. Thankfully, the two specs are similar enough, but they're extremely poorly written and constructed. I guess the science and art of spec writing is somewhat immature, and we don't have a reliable pipeline of people trained in it. They should also expand their medium to include authoritative graphical representations of some of their concepts, and use machine-readable formats as well. They rely too much on English, but they don't know how to write good specs in English. Here's one of the HTML5 specs: https://www.w3.org/TR/html5/

    You may already know this but GLZA's string matching is a lot different from LZ77 or deflate. Instead of searching for the most efficient local transmission of matches/literals the algorithm recursively searches for the best global matches and creates rules for those. So instead of knowing not to bother with looking for matches in a section of data, for GLZA you would want to not include that section of data in the suffix tree that is built for global match finding. I think this wouldn't be much time saving if the data doesn't have long matches because the code doesn't have to traverse very far in the tree before it reaches a leaf. I think what's really important is being able to efficiently find the strings that provide the most immediate compression while minimizing the loss in "future" compressibility of the file. So smart string finding via parsing and HTML token recognition is of interest to me because it may improve speed (significantly?) and compression ratio (slightly?), plus it seems like there may be some synergies with tag finding, so I'd like to find some decent specs on the structures to get a better idea of what smarter string matching can mean.
    I'm embarrassed to admit that I'm not familiar with GLZA's string matching algorithm(s), just with GLZA's remarkable performance. Your approach to smarter string matching sounds promising. gzip isn't aware that it is compressing HTML, CSS, or JS. It has no idea that body> won't be repeated until near the end of the document, doesn't have any concept of tags or attributes or any idea what to expect. There are interesting opportunities for smarter string matching. Take the following snippet from Apple's iPhone page:

    Code:
    <meta property="analytics-s-channel" content="iphone.tab+other" />
    <meta property="analytics-s-bucket-0" content="appleglobal,apple{COUNTRY_CODE}iphonetab" />
    <meta property="analytics-s-bucket-1" content="apple{COUNTRY_CODE}global,apple{COUNTRY_CODE}iphonetab" />
    <meta property="analytics-s-bucket-2" content="apple{COUNTRY_CODE}global" />
    <meta property="analytics-s-bucket-store" content="applestoreww,applestoreamr,applestoreus" />
    Gzip/deflate will lose the match after bucket-, and will need to code the enumerations as literals 0, 1, 2, etc., then start a new match (with new symbols) with " content="apple, or something along those lines. This isn't the best example of broken-up matches, but it illustrates some of the common phenomena. There's a lot of opportunity for smarter interpolated or permuted matches, and much of it could be informed by the nature of the section or tag. (meta tags are good places to look for such strings.) Brotli has an interesting permutable dictionary that I'd like to dig into some more. ("permutable dictionary" is what I call it – I don't know if there's a more official label for that kind of dictionary)

    There's also an opportunity for manipulating/changing the data, which we normally assume we can't do when compressing. In the case of HTML, CSS, and JS, simple minification and normalization transformations are possible without changing the meaning of the code. For example, a lot of times the above meta elements will vary in how the tag is closed. Here we see the XML-style />, but in HTML5 these elements are supposed to be closed without a slash, just with >. You'll often see some with the slash, and some without, in the same HTML file, and with or without a space before, which busts the deflate string match into at least two permutations (assuming there's a match at the end of the element, or combining the tag closure with the start of the next element, like /><meta. A compressor that knew that it could normalize all those as no-space and no-slash would help in some cases. (and stripping all the CRs from CRLF combos, normalizing the ordering of attributes, and lots of other things.)

    Just to be clear, GLZA does use lots of memory and CPU for compression compared to most or all LZxx compressors. Decompression characteristics are much better; if there's a problem on an Android, then it's something I did wrong in the coding. In the medium to long run, I could see where GLZ could be very useful for decompressing web data. In the short term, it's probably a little immature. Compression ratios are likely to improve at least a little over time, speed can be improved, code can be cleaner, etc.
    Oh I didn't mean to suggest that there was anything wrong with GLZA, on Android or any other platform. My point was a general one. I have no idea how GLZA performs in terms of decomp resources, and I wholly agree that it's too early to hold GLZA to high standards for efficiency as a web compression format. I think it has great potential. One other consideration on smart string matching is that with the web, where you have both HTML and CSS content referring to the same elements or objects in a tree like the DOM, there's an opportunity to bundle the attributes and style directives together on the encoding of the relevant element. A compressor that did this would eliminate a lot of bloat and redundancy in how a CSS file repeats strings from an HTML file.

    FYI, Mahoney's LTCB page says that GLZA is not a general-purpose compressor and is only designed to compress enwik9. I take it things have changed and this is no longer the case? You might want to have him update that passage.

  38. The Following User Says Thank You to SolidComp For This Useful Post:

    Kennon Conrad (1st September 2016)

  39. #810
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts

    GLZA v0.8

    GLZA v0.8 includes the following changes compared to v0.7.1:

    1. Bug fix for files that start with three or more capital letters and "bug" fix for incorrect delta filter decision math that made lzt24 compression worse than it should.
    2. There are 17 dictionaries and models for extended UTF-8 symbols instead of 1 so that Greek, Latin, Cyrillic, Hebrew, etc. can have unique trailing/leading symbol models. This should improve compression of multi-lingual files and it helps on the wiki's, but my test set is limited.
    3. Lots of little changes that sometimes give faster compression, typically just a few percent, but up to about 500% faster on some of my test files.
    4. Added -c#, -p#, and -r# command line options back in, c is the production cost in bits, p is a factor used to favor longer strings over most compressive, and r sets the compression memory use in MB.

    Results for enwik8:
    GLZA c enwik8 enwik8.glza: 20,472,828 bytes in 431 sec., 3,321 MB; decompress 1.8 sec., 47 MB.
    GLZA c -p3 enwik8 enwik8.glza: 20,442,490 bytes in 493 sec., 3,257 MB; decompress 1.8 sec., 48 MB.

    Results for enwik9:
    GLZA c enwik9 enwik9.glza: 164,943,294 bytes in 9,328 sec., 12,673 MB; decompress 15.8 sec., 363 MB.
    GLZA c -p3 enwik9 enwik9.glza: 164,634,038 bytes in 10,106 sec., 12,369 MB; decompress 15.8 sec., 364 MB.

    The source code in a .zip file is 64,327 bytes so enwik9 (-p3) + code = 164,698,365 bytes. Matt, if you read this, could you also update the description as SolidComp mentions to indicate GLZA is general purpose (but most effective on text)?

    Top twenty results by compression ratio on the ten wiki's from the xml test in Stephan Busch's Squeezechart with my custom lzbench:

    Code:
    Compressor name         Compress. Decompress. Compr. size  Ratio Filename
    glza 0.8                 0.17 MB/s    77 MB/s    15111331  15.11 arwiki-20090209-pages-articles.xml
    xz 5.2.2 -9              1.88 MB/s   127 MB/s    17801068  17.80 arwiki-20090209-pages-articles.xml
    lzlib 1.7 -9             1.61 MB/s    92 MB/s    17898415  17.90 arwiki-20090209-pages-articles.xml
    csc 3.3 -5               2.17 MB/s   118 MB/s    18468350  18.47 arwiki-20090209-pages-articles.xml
    zstd 0.8.0 -22           1.87 MB/s   595 MB/s    18666760  18.67 arwiki-20090209-pages-articles.xml
    lzma 9.38 -5             2.27 MB/s   133 MB/s    18892558  18.89 arwiki-20090209-pages-articles.xml
    tornado 0.6a -16         1.84 MB/s   277 MB/s    18927009  18.93 arwiki-20090209-pages-articles.xml
    xz 5.2.2 -6              2.42 MB/s   123 MB/s    18943046  18.94 arwiki-20090209-pages-articles.xml
    lzlib 1.7 -6             2.28 MB/s    90 MB/s    19157260  19.16 arwiki-20090209-pages-articles.xml
    brotli 0.4.0 -11         0.56 MB/s   599 MB/s    19455349  19.46 arwiki-20090209-pages-articles.xml
    zstd 0.8.0 -18           3.66 MB/s   796 MB/s    20786291  20.79 arwiki-20090209-pages-articles.xml
    tornado 0.6a -13         6.21 MB/s   255 MB/s    21622792  21.62 arwiki-20090209-pages-articles.xml
    tornado 0.6a -10         7.39 MB/s   251 MB/s    22082206  22.08 arwiki-20090209-pages-articles.xml
    csc 3.3 -3               7.34 MB/s    81 MB/s    22085718  22.09 arwiki-20090209-pages-articles.xml
    zling 2016-01-10 -4        48 MB/s   245 MB/s    22407488  22.41 arwiki-20090209-pages-articles.xml
    lzham 1.0 -d26 -1        3.20 MB/s   305 MB/s    22416942  22.42 arwiki-20090209-pages-articles.xml
    zling 2016-01-10 -3        61 MB/s   244 MB/s    22758777  22.76 arwiki-20090209-pages-articles.xml
    zstd 0.8.0 -15           7.55 MB/s   838 MB/s    23045717  23.05 arwiki-20090209-pages-articles.xml
    brotli 0.4.0 -8            11 MB/s   577 MB/s    23319342  23.32 arwiki-20090209-pages-articles.xml
    Code:
    Compressor name         Compress. Decompress. Compr. size  Ratio Filename
    glza 0.8                 0.19 MB/s    58 MB/s    18432446  18.43 dewiki-20090311-pages-articles.xml
    xz 5.2.2 -9              1.76 MB/s   108 MB/s    22337284  22.34 dewiki-20090311-pages-articles.xml
    lzlib 1.7 -9             1.59 MB/s    77 MB/s    22559912  22.56 dewiki-20090311-pages-articles.xml
    zstd 0.8.0 -22           1.81 MB/s   505 MB/s    23012645  23.01 dewiki-20090311-pages-articles.xml
    tornado 0.6a -16         1.83 MB/s   241 MB/s    23367196  23.37 dewiki-20090311-pages-articles.xml
    lzma 9.38 -5             2.11 MB/s   113 MB/s    23383008  23.38 dewiki-20090311-pages-articles.xml
    csc 3.3 -5               2.66 MB/s    89 MB/s    23499178  23.50 dewiki-20090311-pages-articles.xml
    xz 5.2.2 -6              2.31 MB/s   103 MB/s    23927507  23.93 dewiki-20090311-pages-articles.xml
    lzlib 1.7 -6             2.17 MB/s    77 MB/s    24066730  24.07 dewiki-20090311-pages-articles.xml
    brotli 0.4.0 -11         0.63 MB/s   504 MB/s    24843180  24.84 dewiki-20090311-pages-articles.xml
    zstd 0.8.0 -18           3.54 MB/s   684 MB/s    25519360  25.52 dewiki-20090311-pages-articles.xml
    tornado 0.6a -13         5.73 MB/s   236 MB/s    25539161  25.54 dewiki-20090311-pages-articles.xml
    tornado 0.6a -10         6.90 MB/s   217 MB/s    26153155  26.15 dewiki-20090311-pages-articles.xml
    csc 3.3 -3               6.79 MB/s    78 MB/s    26344037  26.34 dewiki-20090311-pages-articles.xml
    lzham 1.0 -d26 -1        2.82 MB/s   248 MB/s    27006473  27.01 dewiki-20090311-pages-articles.xml
    tornado 0.6a -7            20 MB/s   225 MB/s    27527486  27.53 dewiki-20090311-pages-articles.xml
    zling 2016-01-10 -4        43 MB/s   196 MB/s    27540390  27.54 dewiki-20090311-pages-articles.xml
    zling 2016-01-10 -3        52 MB/s   205 MB/s    27835794  27.84 dewiki-20090311-pages-articles.xml
    zstd 0.8.0 -15           7.76 MB/s   713 MB/s    28192801  28.19 dewiki-20090311-pages-articles.xml
    zling 2016-01-10 -2        62 MB/s   201 MB/s    28309613  28.31 dewiki-20090311-pages-articles.xml
    Code:
    Compressor name         Compress. Decompress. Compr. size  Ratio Filename
    glza 0.8                 0.15 MB/s    54 MB/s    20663809  20.66 enwiki-20090306-pages-articles.xml
    csc 3.3 -5               3.20 MB/s    76 MB/s    24730845  24.73 enwiki-20090306-pages-articles.xml
    xz 5.2.2 -9              1.65 MB/s    94 MB/s    24897946  24.90 enwiki-20090306-pages-articles.xml
    lzlib 1.7 -9             1.56 MB/s    71 MB/s    25193126  25.19 enwiki-20090306-pages-articles.xml
    zstd 0.8.0 -22           1.72 MB/s   451 MB/s    25471920  25.47 enwiki-20090306-pages-articles.xml
    tornado 0.6a -16         1.75 MB/s   202 MB/s    25869767  25.87 enwiki-20090306-pages-articles.xml
    lzma 9.38 -5             1.96 MB/s   102 MB/s    25919628  25.92 enwiki-20090306-pages-articles.xml
    xz 5.2.2 -6              2.23 MB/s    92 MB/s    26385363  26.39 enwiki-20090306-pages-articles.xml
    lzlib 1.7 -6             2.06 MB/s    70 MB/s    26467752  26.47 enwiki-20090306-pages-articles.xml
    csc 3.3 -3               6.81 MB/s    71 MB/s    26623711  26.62 enwiki-20090306-pages-articles.xml
    brotli 0.4.0 -11         0.61 MB/s   460 MB/s    27035931  27.04 enwiki-20090306-pages-articles.xml
    zstd 0.8.0 -18           3.39 MB/s   645 MB/s    27729827  27.73 enwiki-20090306-pages-articles.xml
    tornado 0.6a -13         5.69 MB/s   218 MB/s    27948707  27.95 enwiki-20090306-pages-articles.xml
    csc 3.3 -1                 20 MB/s    69 MB/s    28747189  28.75 enwiki-20090306-pages-articles.xml
    tornado 0.6a -10         6.23 MB/s   194 MB/s    28997414  29.00 enwiki-20090306-pages-articles.xml
    lzham 1.0 -d26 -1        2.74 MB/s   250 MB/s    29325456  29.33 enwiki-20090306-pages-articles.xml
    zling 2016-01-10 -4        42 MB/s   194 MB/s    29540358  29.54 enwiki-20090306-pages-articles.xml
    zling 2016-01-10 -3        49 MB/s   193 MB/s    29804829  29.80 enwiki-20090306-pages-articles.xml
    tornado 0.6a -7            19 MB/s   208 MB/s    30004796  30.00 enwiki-20090306-pages-articles.xml
    zstd 0.8.0 -15           7.81 MB/s   673 MB/s    30244817  30.24 enwiki-20090306-pages-articles.xml
    Code:
    Compressor name         Compress. Decompress. Compr. size  Ratio Filename
    glza 0.8                 0.14 MB/s    57 MB/s    19487875  19.49 eswiki-20090124-pages-articles.xml
    xz 5.2.2 -9              1.67 MB/s   104 MB/s    22973553  22.97 eswiki-20090124-pages-articles.xml
    lzlib 1.7 -9             1.55 MB/s    76 MB/s    23257551  23.26 eswiki-20090124-pages-articles.xml
    zstd 0.8.0 -22           1.72 MB/s   479 MB/s    23659939  23.66 eswiki-20090124-pages-articles.xml
    tornado 0.6a -16         1.76 MB/s   233 MB/s    24081543  24.08 eswiki-20090124-pages-articles.xml
    lzma 9.38 -5             1.98 MB/s   109 MB/s    24101658  24.10 eswiki-20090124-pages-articles.xml
    csc 3.3 -5               2.22 MB/s    93 MB/s    24132085  24.13 eswiki-20090124-pages-articles.xml
    xz 5.2.2 -6              2.25 MB/s    99 MB/s    24604100  24.60 eswiki-20090124-pages-articles.xml
    lzlib 1.7 -6             2.07 MB/s    74 MB/s    24706819  24.71 eswiki-20090124-pages-articles.xml
    brotli 0.4.0 -11         0.60 MB/s   479 MB/s    25323130  25.32 eswiki-20090124-pages-articles.xml
    zstd 0.8.0 -18           3.40 MB/s   670 MB/s    26161278  26.16 eswiki-20090124-pages-articles.xml
    tornado 0.6a -13         5.75 MB/s   228 MB/s    26379024  26.38 eswiki-20090124-pages-articles.xml
    tornado 0.6a -10         6.55 MB/s   207 MB/s    27067931  27.07 eswiki-20090124-pages-articles.xml
    csc 3.3 -3               6.23 MB/s    79 MB/s    27453650  27.45 eswiki-20090124-pages-articles.xml
    lzham 1.0 -d26 -1        2.86 MB/s   262 MB/s    27968904  27.97 eswiki-20090124-pages-articles.xml
    tornado 0.6a -7            19 MB/s   220 MB/s    28386862  28.39 eswiki-20090124-pages-articles.xml
    zling 2016-01-10 -4        43 MB/s   200 MB/s    28438083  28.44 eswiki-20090124-pages-articles.xml
    zstd 0.8.0 -15           7.74 MB/s   705 MB/s    28707523  28.71 eswiki-20090124-pages-articles.xml
    zling 2016-01-10 -3        51 MB/s   200 MB/s    28724958  28.72 eswiki-20090124-pages-articles.xml
    brotli 0.4.0 -8          9.98 MB/s   541 MB/s    29074937  29.07 eswiki-20090124-pages-articles.xml
    Code:
    Compressor name         Compress. Decompress. Compr. size  Ratio Filename
    glza 0.8                 0.18 MB/s    59 MB/s    18718785  18.72 frwiki-20090224-pages-articles.xml
    xz 5.2.2 -9              1.77 MB/s   108 MB/s    21992043  21.99 frwiki-20090224-pages-articles.xml
    lzlib 1.7 -9             1.60 MB/s    79 MB/s    22236448  22.24 frwiki-20090224-pages-articles.xml
    csc 3.3 -5               2.05 MB/s   105 MB/s    22430756  22.43 frwiki-20090224-pages-articles.xml
    zstd 0.8.0 -22           1.79 MB/s   506 MB/s    22741082  22.74 frwiki-20090224-pages-articles.xml
    lzma 9.38 -5             2.10 MB/s   114 MB/s    23059972  23.06 frwiki-20090224-pages-articles.xml
    tornado 0.6a -16         1.82 MB/s   242 MB/s    23137970  23.14 frwiki-20090224-pages-articles.xml
    xz 5.2.2 -6              2.34 MB/s   101 MB/s    23412050  23.41 frwiki-20090224-pages-articles.xml
    lzlib 1.7 -6             2.17 MB/s    77 MB/s    23546325  23.55 frwiki-20090224-pages-articles.xml
    brotli 0.4.0 -11         0.62 MB/s   501 MB/s    24094280  24.09 frwiki-20090224-pages-articles.xml
    zstd 0.8.0 -18           3.53 MB/s   698 MB/s    25031960  25.03 frwiki-20090224-pages-articles.xml
    tornado 0.6a -13         5.81 MB/s   236 MB/s    25363574  25.36 frwiki-20090224-pages-articles.xml
    tornado 0.6a -10         6.79 MB/s   217 MB/s    25972505  25.97 frwiki-20090224-pages-articles.xml
    csc 3.3 -3               6.41 MB/s    86 MB/s    26377779  26.38 frwiki-20090224-pages-articles.xml
    lzham 1.0 -d26 -1        2.87 MB/s   277 MB/s    26777656  26.78 frwiki-20090224-pages-articles.xml
    zling 2016-01-10 -4        44 MB/s   207 MB/s    27150257  27.15 frwiki-20090224-pages-articles.xml
    tornado 0.6a -7            20 MB/s   227 MB/s    27305793  27.31 frwiki-20090224-pages-articles.xml
    zling 2016-01-10 -3        52 MB/s   205 MB/s    27435686  27.44 frwiki-20090224-pages-articles.xml
    zstd 0.8.0 -15           7.93 MB/s   725 MB/s    27481008  27.48 frwiki-20090224-pages-articles.xml
    brotli 0.4.0 -8            10 MB/s   550 MB/s    27778927  27.78 frwiki-20090224-pages-articles.xml
    Code:
    Compressor name         Compress. Decompress. Compr. size  Ratio Filename
    glza 0.8                 0.14 MB/s   100 MB/s     9960057   9.96 hiwiki-20090201-pages-articles.xml
    lzlib 1.7 -9             1.70 MB/s   125 MB/s    11841707  11.84 hiwiki-20090201-pages-articles.xml
    xz 5.2.2 -9              2.64 MB/s   180 MB/s    11987284  11.99 hiwiki-20090201-pages-articles.xml
    zstd 0.8.0 -22           2.07 MB/s   930 MB/s    12584945  12.58 hiwiki-20090201-pages-articles.xml
    xz 5.2.2 -6              3.16 MB/s   174 MB/s    12707044  12.71 hiwiki-20090201-pages-articles.xml
    tornado 0.6a -16         2.19 MB/s   394 MB/s    12786422  12.79 hiwiki-20090201-pages-articles.xml
    csc 3.3 -5               3.47 MB/s   163 MB/s    12794301  12.79 hiwiki-20090201-pages-articles.xml
    brotli 0.4.0 -11         0.62 MB/s   796 MB/s    12939409  12.94 hiwiki-20090201-pages-articles.xml
    lzlib 1.7 -6             3.16 MB/s   119 MB/s    13096161  13.10 hiwiki-20090201-pages-articles.xml
    lzma 9.38 -5             3.49 MB/s   185 MB/s    13111327  13.11 hiwiki-20090201-pages-articles.xml
    zstd 0.8.0 -18           5.04 MB/s  1101 MB/s    14466596  14.47 hiwiki-20090201-pages-articles.xml
    tornado 0.6a -10           10 MB/s   367 MB/s    14756446  14.76 hiwiki-20090201-pages-articles.xml
    tornado 0.6a -13         7.80 MB/s   350 MB/s    14903736  14.90 hiwiki-20090201-pages-articles.xml
    zstd 0.8.0 -15           8.63 MB/s  1139 MB/s    15476283  15.48 hiwiki-20090201-pages-articles.xml
    brotli 0.4.0 -8            19 MB/s   786 MB/s    15597658  15.60 hiwiki-20090201-pages-articles.xml
    zling 2016-01-10 -4        75 MB/s   343 MB/s    15662654  15.66 hiwiki-20090201-pages-articles.xml
    lzham 1.0 -d26 -1        3.72 MB/s   421 MB/s    15666744  15.67 hiwiki-20090201-pages-articles.xml
    csc 3.3 -3                 11 MB/s   106 MB/s    15750979  15.75 hiwiki-20090201-pages-articles.xml
    xz 5.2.2 -3                14 MB/s   141 MB/s    15880497  15.88 hiwiki-20090201-pages-articles.xml
    zling 2016-01-10 -3        94 MB/s   341 MB/s    15933596  15.93 hiwiki-20090201-pages-articles.xml
    Code:
    Compressor name         Compress. Decompress. Compr. size  Ratio Filename
    glza 0.8                 0.13 MB/s    58 MB/s    17633722  17.63 ptwiki-20090128-pages-articles.xml
    xz 5.2.2 -9              1.89 MB/s   112 MB/s    20962447  20.96 ptwiki-20090128-pages-articles.xml
    lzlib 1.7 -9             1.63 MB/s    82 MB/s    21157830  21.16 ptwiki-20090128-pages-articles.xml
    csc 3.3 -5               2.22 MB/s   110 MB/s    21387628  21.39 ptwiki-20090128-pages-articles.xml
    zstd 0.8.0 -22           1.79 MB/s   532 MB/s    21611704  21.61 ptwiki-20090128-pages-articles.xml
    tornado 0.6a -16         1.85 MB/s   254 MB/s    21990066  21.99 ptwiki-20090128-pages-articles.xml
    lzma 9.38 -5             2.27 MB/s   118 MB/s    22140923  22.14 ptwiki-20090128-pages-articles.xml
    xz 5.2.2 -6              2.50 MB/s   106 MB/s    22401452  22.40 ptwiki-20090128-pages-articles.xml
    lzlib 1.7 -6             2.34 MB/s    80 MB/s    22598101  22.60 ptwiki-20090128-pages-articles.xml
    brotli 0.4.0 -11         0.61 MB/s   508 MB/s    22928058  22.93 ptwiki-20090128-pages-articles.xml
    tornado 0.6a -13         6.16 MB/s   247 MB/s    23956744  23.96 ptwiki-20090128-pages-articles.xml
    zstd 0.8.0 -18           3.79 MB/s   733 MB/s    24044270  24.04 ptwiki-20090128-pages-articles.xml
    tornado 0.6a -10         7.17 MB/s   227 MB/s    24693567  24.69 ptwiki-20090128-pages-articles.xml
    csc 3.3 -3               6.91 MB/s    91 MB/s    24879954  24.88 ptwiki-20090128-pages-articles.xml
    lzham 1.0 -d26 -1        2.98 MB/s   285 MB/s    25432622  25.43 ptwiki-20090128-pages-articles.xml
    zling 2016-01-10 -4        48 MB/s   219 MB/s    25704231  25.70 ptwiki-20090128-pages-articles.xml
    tornado 0.6a -7            21 MB/s   239 MB/s    25768559  25.77 ptwiki-20090128-pages-articles.xml
    zling 2016-01-10 -3        56 MB/s   218 MB/s    25959462  25.96 ptwiki-20090128-pages-articles.xml
    zstd 0.8.0 -15           8.32 MB/s   762 MB/s    26164752  26.16 ptwiki-20090128-pages-articles.xml
    brotli 0.4.0 -8            11 MB/s   545 MB/s    26172167  26.17 ptwiki-20090128-pages-articles.xml
    Code:
    Compressor name         Compress. Decompress. Compr. size  Ratio Filename
    glza 0.8                 0.26 MB/s    82 MB/s    13723940  13.72 ruwiki-20081228-pages-articles.xml
    xz 5.2.2 -9              1.88 MB/s   139 MB/s    16379184  16.38 ruwiki-20081228-pages-articles.xml
    lzlib 1.7 -9             1.58 MB/s    94 MB/s    16559388  16.56 ruwiki-20081228-pages-articles.xml
    csc 3.3 -5               2.16 MB/s   130 MB/s    17094657  17.09 ruwiki-20081228-pages-articles.xml
    zstd 0.8.0 -22           1.83 MB/s   629 MB/s    17154468  17.15 ruwiki-20081228-pages-articles.xml
    tornado 0.6a -16         1.86 MB/s   301 MB/s    17444960  17.44 ruwiki-20081228-pages-articles.xml
    lzma 9.38 -5             2.29 MB/s   142 MB/s    17576244  17.58 ruwiki-20081228-pages-articles.xml
    xz 5.2.2 -6              2.36 MB/s   132 MB/s    17653495  17.65 ruwiki-20081228-pages-articles.xml
    lzlib 1.7 -6             2.25 MB/s    95 MB/s    17883170  17.88 ruwiki-20081228-pages-articles.xml
    brotli 0.4.0 -11         0.60 MB/s   643 MB/s    18283528  18.28 ruwiki-20081228-pages-articles.xml
    zstd 0.8.0 -18           3.65 MB/s   831 MB/s    19555354  19.56 ruwiki-20081228-pages-articles.xml
    tornado 0.6a -10         7.61 MB/s   276 MB/s    20069790  20.07 ruwiki-20081228-pages-articles.xml
    tornado 0.6a -13         5.99 MB/s   277 MB/s    20596619  20.60 ruwiki-20081228-pages-articles.xml
    zstd 0.8.0 -15           7.20 MB/s   866 MB/s    21593370  21.59 ruwiki-20081228-pages-articles.xml
    zling 2016-01-10 -4        47 MB/s   249 MB/s    21834612  21.83 ruwiki-20081228-pages-articles.xml
    lzham 1.0 -d26 -1        3.26 MB/s   315 MB/s    21886926  21.89 ruwiki-20081228-pages-articles.xml
    csc 3.3 -3               8.06 MB/s    85 MB/s    22092178  22.09 ruwiki-20081228-pages-articles.xml
    brotli 0.4.0 -8            12 MB/s   617 MB/s    22258010  22.26 ruwiki-20081228-pages-articles.xml
    zling 2016-01-10 -3        60 MB/s   246 MB/s    22388231  22.39 ruwiki-20081228-pages-articles.xml
    tornado 0.6a -7            25 MB/s   270 MB/s    22484142  22.48 ruwiki-20081228-pages-articles.xml
    Code:
    Compressor name         Compress. Decompress. Compr. size  Ratio Filename
    glza 0.8                 0.16 MB/s    57 MB/s    17364165  17.36 trwiki-20090207-pages-articles.xml
    xz 5.2.2 -9              2.01 MB/s   113 MB/s    20105194  20.11 trwiki-20090207-pages-articles.xml
    lzlib 1.7 -9             1.65 MB/s    83 MB/s    20196223  20.20 trwiki-20090207-pages-articles.xml
    csc 3.3 -5               2.43 MB/s   110 MB/s    20598614  20.60 trwiki-20090207-pages-articles.xml
    zstd 0.8.0 -22           1.84 MB/s   546 MB/s    20884896  20.88 trwiki-20090207-pages-articles.xml
    lzma 9.38 -5             2.46 MB/s   119 MB/s    21260452  21.26 trwiki-20090207-pages-articles.xml
    tornado 0.6a -16         1.94 MB/s   252 MB/s    21307495  21.31 trwiki-20090207-pages-articles.xml
    xz 5.2.2 -6              2.64 MB/s   108 MB/s    21373835  21.37 trwiki-20090207-pages-articles.xml
    lzlib 1.7 -6             2.52 MB/s    81 MB/s    21627986  21.63 trwiki-20090207-pages-articles.xml
    brotli 0.4.0 -11         0.60 MB/s   515 MB/s    21815872  21.82 trwiki-20090207-pages-articles.xml
    tornado 0.6a -13         6.19 MB/s   248 MB/s    22985956  22.99 trwiki-20090207-pages-articles.xml
    zstd 0.8.0 -18           4.04 MB/s   725 MB/s    23301538  23.30 trwiki-20090207-pages-articles.xml
    tornado 0.6a -10         7.31 MB/s   230 MB/s    23804940  23.80 trwiki-20090207-pages-articles.xml
    csc 3.3 -3               7.39 MB/s    93 MB/s    23848749  23.85 trwiki-20090207-pages-articles.xml
    lzham 1.0 -d26 -1        2.99 MB/s   288 MB/s    24197793  24.20 trwiki-20090207-pages-articles.xml
    zling 2016-01-10 -4        49 MB/s   226 MB/s    24598592  24.60 trwiki-20090207-pages-articles.xml
    zling 2016-01-10 -3        57 MB/s   224 MB/s    24813797  24.81 trwiki-20090207-pages-articles.xml
    brotli 0.4.0 -8            11 MB/s   543 MB/s    24855628  24.86 trwiki-20090207-pages-articles.xml
    tornado 0.6a -7            21 MB/s   238 MB/s    24907785  24.91 trwiki-20090207-pages-articles.xml
    xz 5.2.2 -3              6.51 MB/s    96 MB/s    25078677  25.08 trwiki-20090207-pages-articles.xml
    Code:
    Compressor name         Compress. Decompress. Compr. size  Ratio Filename
    glza 0.8                 0.22 MB/s    55 MB/s    21376704  21.38 zhwiki-20090116-pages-articles.xml
    xz 5.2.2 -9              2.07 MB/s    90 MB/s    25256625  25.26 zhwiki-20090116-pages-articles.xml
    lzlib 1.7 -9             1.77 MB/s    69 MB/s    25515468  25.52 zhwiki-20090116-pages-articles.xml
    csc 3.3 -5               2.63 MB/s    88 MB/s    25762986  25.76 zhwiki-20090116-pages-articles.xml
    zstd 0.8.0 -22           1.93 MB/s   438 MB/s    26285914  26.29 zhwiki-20090116-pages-articles.xml
    lzma 9.38 -5             2.53 MB/s    97 MB/s    26461362  26.46 zhwiki-20090116-pages-articles.xml
    tornado 0.6a -16         2.14 MB/s   207 MB/s    26754406  26.75 zhwiki-20090116-pages-articles.xml
    xz 5.2.2 -6              2.77 MB/s    87 MB/s    26876747  26.88 zhwiki-20090116-pages-articles.xml
    lzlib 1.7 -6             2.58 MB/s    67 MB/s    27044349  27.04 zhwiki-20090116-pages-articles.xml
    brotli 0.4.0 -11         0.59 MB/s   425 MB/s    27726251  27.73 zhwiki-20090116-pages-articles.xml
    tornado 0.6a -13         6.20 MB/s   203 MB/s    28199772  28.20 zhwiki-20090116-pages-articles.xml
    csc 3.3 -3               6.25 MB/s    77 MB/s    28433217  28.43 zhwiki-20090116-pages-articles.xml
    zstd 0.8.0 -18           4.48 MB/s   600 MB/s    29416075  29.42 zhwiki-20090116-pages-articles.xml
    lzham 1.0 -d26 -1        2.66 MB/s   247 MB/s    29506389  29.51 zhwiki-20090116-pages-articles.xml
    tornado 0.6a -10         6.42 MB/s   185 MB/s    30456354  30.46 zhwiki-20090116-pages-articles.xml
    zling 2016-01-10 -4        37 MB/s   177 MB/s    30735528  30.74 zhwiki-20090116-pages-articles.xml
    zling 2016-01-10 -3        42 MB/s   175 MB/s    30907553  30.91 zhwiki-20090116-pages-articles.xml
    csc 3.3 -1                 20 MB/s    77 MB/s    31113233  31.11 zhwiki-20090116-pages-articles.xml
    brotli 0.4.0 -8          8.86 MB/s   426 MB/s    31175292  31.18 zhwiki-20090116-pages-articles.xml
    zling 2016-01-10 -2        47 MB/s   172 MB/s    31182794  31.18 zhwiki-20090116-pages-articles.xml
    GLZA's compressed files are 13 - 18% smaller than the #2 entry in the above cases. That's more consistent than I expected considering all the different languages. It seems like there is plenty of margin to allow for a much faster/slightly less effective production rule generator and/or encoder/decoder.
    Attached Files Attached Files

  40. The Following 9 Users Say Thank You to Kennon Conrad For This Useful Post:

    Gonzalo (27th September 2016),JamesB (3rd October 2016),Matt Mahoney (27th September 2016),Mike (27th September 2016),Nania Francesco (27th September 2016),Sportman (27th September 2016),Stephan Busch (27th September 2016),surfersat (28th September 2016),VadimV (1st October 2016)

Page 27 of 29 FirstFirst ... 172526272829 LastLast

Similar Threads

  1. Replies: 4
    Last Post: 2nd December 2012, 02:55
  2. Suffix Tree's internal representation
    By Piotr Tarsa in forum Data Compression
    Replies: 4
    Last Post: 18th December 2011, 07:37
  3. M03 alpha
    By michael maniscalco in forum Data Compression
    Replies: 6
    Last Post: 10th October 2009, 00:31
  4. PIM 2.00 (alpha) is here!!!
    By encode in forum Forum Archive
    Replies: 46
    Last Post: 14th June 2007, 19:27
  5. PIM 2.00 (alpha) overview
    By encode in forum Forum Archive
    Replies: 21
    Last Post: 8th June 2007, 13:41

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •