Page 9 of 10 FirstFirst ... 78910 LastLast
Results 241 to 270 of 275

Thread: Brotli

  1. #241
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,918
    Thanks
    291
    Thanked 1,276 Times in 722 Posts
    > The compiler that is most intriguing to me is Intel's C/C++ compiler, called Parallel Studio.
    > What smattering of data exist suggests that it's better than gcc, Visual Studio, clang, etc.

    That was true 10 years ago, but unfortunately mostly changed. Intel didn't improve their engine since 2009 or so
    (added GPU offload, languages extensions, new instructions, but not general-purpose optimizations - IC11 is frequently better than IC20 on scalar code).
    and now decided to migrate to clang completely: https://software.intel.com/en-us/art...llvm-framework

    I'd still use it because its more user-friendly - its easier to develop with IC, then add using directives and other syntax for gcc.
    Also IC still sometimes wins on vector code and PGO.
    But in most cases, currently gcc8 wins (gcc9 seems slightly worse in my tests), on scalar code sometimes new clang wins.
    VS is only good for exe size optimization.
    Code:
      0.562s:  brotli_ic19.exe -fo 1 -q 1 enwik8
      0.562s:  brotli_cl90.exe -fo 1 -q 1 enwik8
      0.547s:  brotli_gc82.exe -fo 1 -q 1 enwik8
      0.921s:  brotli_vc19.exe -fo 1 -q 1 enwik8
    
      0.390s:  brotli_ic19.exe -d -fo 2 1
      0.437s:  brotli_cl90.exe -d -fo 2 1
      0.375s:  brotli_gc82.exe -d -fo 2 1
      0.594s:  brotli_vc19.exe -d -fo 2 1

  2. #242
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    841
    Thanks
    241
    Thanked 308 Times in 184 Posts
    We developed and optimized Brotli code (in 2013–2015) on gcc, so it being slow on Microsoft is not necessarily a problem in their compiler. If someone figured the reason out and made it perform better on Microsoft, we would welcome such changes in the repository. My guesswork is that there could be unfortunate inlining decisions.

  3. #243
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    353
    Thanks
    131
    Thanked 54 Times in 38 Posts
    Hi all – I'm impressed with the results of compressing jQuery with brotli, compared to Zstd and libdeflate (gzip):

    Original jQuery 3.5.1 (latest): 89,476 bytes (this is the minified production version from jQuery.com: Link)

    libdeflate 1.6 gzip -11: 36,043 (libdeflate adds two extra compression levels to zlib gzip's nine)
    Zstd 1.4.5 -22: 29,453
    brotli 1.0.4 -11: 28,007
    brotli 1.0.7 -11: 27,954

    Update: 7-Zip's gzipper is incredible: 29,960 bytes. I'm not sure why it's so much better than libdeflate, or how it's so close to Zstd and brotli.

    Compression of web files is much more important to me than the weird benchmarks that are typically used. And this is where brotli shines, not surprisingly since it was designed for the web. Note that brotli has a dictionary, generated from web files, whereas Zstd and libdeflate do not. You can generate a dictionary with Zstd, but it keeps giving me an error saying there aren't enough samples...

    Brotli 1.0.7 performs slightly better than 1.0.4, which was surprising since there was nothing in the release notes that indicated improvements for the max compression setting (-11), just an improvement for the -1 setting. The only other difference is that I compiled my 1.0.7 version myself in Visual Studio 2019, dynamically linked, whereas my 1.0.4 version is the official build from GitHub, a static executable (1.0.4 is the last version they released builds for – all they've released is source code for 1.0.5 through 1.0.7).

    Jyrki, should compression results (size) be exactly the same across compilers, deterministic given the same settings? So it has to be something in the source code of 1.0.7 compared to 1.0.4 that explains the improvement, right, not Visual Studio?
    Last edited by SolidComp; 24th May 2020 at 20:33.

  4. #244
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,918
    Thanks
    291
    Thanked 1,276 Times in 722 Posts
    Its unfair comparison, since both brotli and zstd have support for external dictionary, but brotli silently uses integrated one
    Code:
    89,476 jquery-3.5.1.min.js
    27,959 jquery-3.5.1.min.bro // brotli_gc82.exe -q 11 -fo jquery-3.5.1.min.bro  jquery-3.5.1.min.js 
    28.754 jquery-3.5.1.min.bro // brotli with dictionary zeroed out in .exe
    29,453 jquery-3.5.1.min.zst // zstd.exe --ultra -22 -fo jquery-3.5.1.min.zst  jquery-3.5.1.min.js 
    28,942 jquery-3.5.1.min.zst // zstd.exe --ultra -22 -fo jquery-3.5.1.min.zst -D dictionary.bin jquery-3.5.1.min.js
    This example uses brotli's default dictionary for zstd, but we can generate a custom dictionary for zstd, while it is harder to do for brotli.

    Yes, brotli at max settings has stronger entropy model than zstd. But it also has 5x slower encoding and 2x slower decoding.
    And actual compression difference is still uncertain, since we can build a larger specialized dictionary for target data with zstd --train.

  5. Thanks:

    Jyrki Alakuijala (24th May 2020)

  6. #245
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    353
    Thanks
    131
    Thanked 54 Times in 38 Posts
    I tried to create a dictionary with --train, but I get an error saying not enough samples or something. I tried it with just the jQuery file as the training sample, which used to work in the past. Then I tried two, then three, then four jQuery files (the previous versions, 3.5.0, 3.4.1, etc.), and still get the error even with four files. Not sure what I'm doing wrong.

  7. #246
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,918
    Thanks
    291
    Thanked 1,276 Times in 722 Posts
    Normally it just needs more data for training.
    But here I made a workaround for you:

    Update: attachment deleted since there was a mistake in the script and it didn't work.

  8. #247
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    841
    Thanks
    241
    Thanked 308 Times in 184 Posts
    I don't remember what was improved. Perhaps hashing.

    There is a lot of floating point in brotli 10 and 11. I don't know how compiler invariant it is the way we do it. I'd assume it to be well-defined, but this could be a naive position.

  9. #248
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    353
    Thanks
    131
    Thanked 54 Times in 38 Posts
    Quote Originally Posted by Shelwien View Post
    Normally it just needs more data for training.
    But here I made a workaround for you:

    Where's the dictionary?

  10. #249
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    353
    Thanks
    131
    Thanked 54 Times in 38 Posts
    Quote Originally Posted by Jyrki Alakuijala View Post
    I don't remember what was improved. Perhaps hashing.

    There is a lot of floating point in brotli 10 and 11. I don't know how compiler invariant it is the way we do it. I'd assume it to be well-defined, but this could be a naive position.
    Do you specify fast math in the makefile or cmake?

  11. #250
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    841
    Thanks
    241
    Thanked 308 Times in 184 Posts
    Based on a quick look at the makefiles, we are not using the fast math option. However, there can be more uncertainty, like perhaps using multiply-and-add as a single instruction leading to a different result that doing multiply and add as two separate instructions. (I'm a bit out of touch with this field. Compilers, vectorization and new instructions are improved constantly.)
    Last edited by Jyrki Alakuijala; 25th May 2020 at 13:20.

  12. #251
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    887
    Thanks
    481
    Thanked 278 Times in 118 Posts
    I tried it with just the jQuery file as the training sample, which used to work in the past. Then I tried two, then three, then four jQuery files (the previous versions, 3.5.0, 3.4.1, etc.), and still get the error even with four files. Not sure what I'm doing wrong.
    You could try it 5 times.
    Assuming that the source file is ~90K, this should force the trainer to provide a dictionary from this material.

    Note though that the produced dictionary will be highly tuned for this specific file, which is not the target model.
    In production environment, we tend to use ~10K samples, randomly extracted from an even larger pool, in order to generate a dictionary for a category of documents.

  13. #252
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,918
    Thanks
    291
    Thanked 1,276 Times in 722 Posts
    @SolidComp:
    Sorry, I left a mistake after renaming samples subdirectory :)

    After running gen.bat, the dictionary is in the file named "dictionary".
    If you're on linux, you can just repeat the operations in gen.bat manually,
    zstd --train produces the dictionary, zstd -D compresses using it.

    Then there's also this option to control dictionary size:
    --maxdict=# : limit dictionary to specified size (default: 112640)
    Attached Files Attached Files

  14. #253
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    353
    Thanks
    131
    Thanked 54 Times in 38 Posts
    Quote Originally Posted by Cyan View Post
    You could try it 5 times.
    Assuming that the source file is ~90K, this should force the trainer to provide a dictionary from this material.

    Note though that the produced dictionary will be highly tuned for this specific file, which is not the target model.
    In production environment, we tend to use ~10K samples, randomly extracted from an even larger pool, in order to generate a dictionary for a category of documents.
    Five times or five files? I added a fifth file, same error. Screenshot below:

    Click image for larger version. 

Name:	Zstd error.JPG 
Views:	22 
Size:	37.5 KB 
ID:	7630

  15. #254
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    887
    Thanks
    481
    Thanked 278 Times in 118 Posts
    I think 5 samples is the absolute minimum.
    Sometimes, even that is not enough, when samples are pretty small.
    But 90K is relatively large, so that should do it (assuming you are adding multiple copies of the same file, adding empty files wouldn't work).

    Looking at your screenshot, I noticed a wildcard character `*`.
    I have no idea how shell expansion works on Windows.
    Chances are, it doesn't.

    Prefer using the `-r` command to load all files from a directory,
    this capability is internal to `zstd` so it should be okay even on Windows, since it doesn't depend on any shell capability.
    Last edited by Cyan; 26th May 2020 at 18:58.

  16. #255
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    841
    Thanks
    241
    Thanked 308 Times in 184 Posts
    The same custom dictionary that zstd uses can be used for brotli. In my testing half the time Zstds custom dictionary builder wins Brotli's similar tool, half the time the opposite. Surprisingly often it is an even better strategy (for resulting compression density) to take random 10 samples of the data, concatenate them as a custom dictionary rather than trying to be smart about it.

  17. #256
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    353
    Thanks
    131
    Thanked 54 Times in 38 Posts
    Quote Originally Posted by Cyan View Post
    I think 5 samples is the absolute minimum.
    Sometimes, even that is not enough, when samples are pretty small.
    But 90K is relatively large, so that should do it (assuming you are adding multiple copies of the same file, adding empty files wouldn't work).

    Looking at your screenshot, I noticed a wildcard character `*`.
    I have no idea how shell expansion works on Windows.
    Chances are, it doesn't.

    Prefer using the `-r` command to load all files from a directory,
    this capability is internal to `zstd` so it should be okay even on Windows, since it doesn't depend on any shell capability.
    Wow it shrunk jQuery down to 10 KB! That's impressive. The dictionary is 110 KB, but that's a one-time hit. There were error messages on dictionary creation though. I don't really understand them:

    PS C:\Tools\Zstd-1.4.5> zstd --train C:\Tools\Zstd-1.4.5\Train -r -o jQDict
    ! Warning : data size of samples too small for target dictionary size
    ! Samples should be about 100x larger than target dictionary size
    Trying 5 different sets of parameters
    WARNING: The maximum dictionary size 112640 is too large compared to the source size 359162! size(source)/size(dictionary) = 3.188583, but it should be >= 10! This may lead to a subpar dictionary! We recommend training on sources at least 10x, and preferably 100x the size of the dictionary!
    k=1024
    d=8
    f=20
    steps=4
    split=75
    accel=1
    Save dictionary of size 112640 into file jQDict
    Click image for larger version. 

Name:	Zstd error 2.JPG 
Views:	13 
Size:	80.2 KB 
ID:	7631

  18. #257
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    841
    Thanks
    241
    Thanked 308 Times in 184 Posts
    Quote Originally Posted by Shelwien View Post
    This example uses brotli's default dictionary for zstd, but we can generate a custom dictionary for zstd, while it is harder to do for brotli.
    This is likely a misunderstanding. Brotli can use the same linear dictionaries used in zstd and the same tooling. The dictionary mechanisms with simple grammar is in addition to that, but ordinary linear dictionaries can be used. One just gets a bit less benefit from them (but not less than zstd gets from these simple dictionaries). Zstd does not yet support transforms on dictionaries as far as I know.

  19. #258
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,568
    Thanks
    775
    Thanked 687 Times in 372 Posts
    AFAIK zstd "dictionary" is just prepended data for LZ matches. This approach can be used with any LZ compressor.

    While brotli dictionary is a list of byte sequences, plus 6 (?) transformations that can be applied to these byte sequences before inserting them into the stream. Yiu can prepend data for LZ matches with brotli too.

  20. Thanks:

    Jyrki Alakuijala (3rd June 2020)

  21. #259
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    504
    Thanks
    184
    Thanked 177 Times in 120 Posts
    Quote Originally Posted by SolidComp View Post
    Hi all – I'm impressed with the results of compressing jQuery with brotli, compared to Zstd and libdeflate (gzip):

    Original jQuery 3.5.1 (latest): 89,476 bytes (this is the minified production version from jQuery.com: Link)

    libdeflate 1.6 gzip -11: 36,043 (libdeflate adds two extra compression levels to zlib gzip's nine)
    Zstd 1.4.5 -22: 29,453
    brotli 1.0.4 -11: 28,007
    brotli 1.0.7 -11: 27,954

    Update: 7-Zip's gzipper is incredible: 29,960 bytes. I'm not sure why it's so much better than libdeflate, or how it's so close to Zstd and brotli.
    With libdeflate I get 29951 (29950 with -12). So close to zstd (I got 29450 on that).

  22. #260
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    841
    Thanks
    241
    Thanked 308 Times in 184 Posts
    Quote Originally Posted by JamesB View Post
    With libdeflate I get 29951 (29950 with -12). So close to zstd (I got 29450 on that).
    Zstd's two improvements over zlib are larger backward reference window and ANS instead of prefix coding. ANS does not help much (0.5 %) if you have a lot of literals -- in brotli's design experiments ANS was a (small) density loss over prefix coding. larger backward references don't help much if you have a small data size (particularly if it is around 32 kB or less). So, no big surprise zstd does not compress better than zlib here.

    Brotli brings three improvements that do help for small data: context modeling, static dictionary and super-cheap swapping back to previous entropy codes.

    Optimized javascript is not as easy to compress as many other data, and often we see only ~10 % further savings from moving from zlib to brotli.

  23. Thanks:

    JamesB (4th June 2020)

  24. #261
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    841
    Thanks
    241
    Thanked 308 Times in 184 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    ... plus 6 (?) transformations that can be applied ...
    There are 120 transforms. Appendix B in rfc7932. A custom 'shared brotli' dictionary can bring its own transforms.

  25. #262
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    353
    Thanks
    131
    Thanked 54 Times in 38 Posts
    Quote Originally Posted by JamesB View Post
    With libdeflate I get 29951 (29950 with -12). So close to zstd (I got 29450 on that).
    Strange, I don't get anywhere near that. I didn't know there was a -12 level, and when I run it I get 34,768, nowhere near as good as 7-Zip's gzipper. What's your command line syntax?

  26. #263
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    353
    Thanks
    131
    Thanked 54 Times in 38 Posts
    Jyrki, are there any hardware implementations of brotli yet? I mean ASICs or FPGAs, like NZip or Microsoft's Corsica: https://www.servethehome.com/microso...e-performance/

    It would be sweet to get brotli 10 or 11 large window equivalent without having to crush the CPU.

  27. #264
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    841
    Thanks
    241
    Thanked 308 Times in 184 Posts
    Quote Originally Posted by SolidComp View Post
    Jyrki, are there any hardware implementations of brotli yet? I mean ASICs or FPGAs, like NZip or Microsoft's Corsica: https://www.servethehome.com/microso...e-performance/
    I'm not aware of hardware brotli. However, any LZ77 hardware engine can be used for brotli compression. I get asked this a lot, and so far I have guided HW people away from brotli towards more simple codecs. Perhaps I should look deeper into it, since logic gates may be cheaper overall than the backward reference buffer.

    I think many hardware compressors don't implement a sliding window, they do blocks without being able to access a previous block. This gives about 5 % density disadvantage for the blocking (HW) approach in any typical LZ77 arrangement with current buffersizes in hardware. As an example, a zstd hardware implementation without sliding window running at quality 9 (and with 64kB blocks) would compress less than gzip with a sliding window 32 kB window (when compressing a concatenated silesia compression corpus).

    Quote Originally Posted by SolidComp View Post
    It would be sweet to get brotli 10 or 11 large window equivalent without having to crush the CPU.
    I have ideas on how to get 2/3rd of the way without slowing down the compression, i.e., expect brotli qualities 5-9 to improve by 5-10 % in density in the next five years.

    We have prioritized PIK in the recent years and are now prioritising JPEG XL format freeze.

  28. Thanks:

    SolidComp (16th June 2020)

  29. #265
    Member
    Join Date
    Apr 2015
    Location
    Greece
    Posts
    101
    Thanks
    37
    Thanked 28 Times in 19 Posts
    Maybe compression in blocks with static statistics (so as to have parallelization other than pipelining inside the block) are a good match for HW.

    Why parallelization inside the block? Because the number of buffers and latency decreases.

    But maybe having dual ported memory to store context for 2 parallel pipelines negates some of the advantage.

    Is there anything with these properties other than static entropy coders?

  30. Thanks:

    SolidComp (16th June 2020)

  31. #266
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    353
    Thanks
    131
    Thanked 54 Times in 38 Posts
    Brotli is the only compressor that is significantly reducing the size of this data. I ditched JSON and protobuf and came up with a Positional Data Syntax (PDS), which is super simple and approaches or beats binary even though it's still readable text. With a known set of fields, like for payments or any other typed schema, we can just eliminate field/key names and order the fields so that field IDs are implicit in the ordering. Any fields added in the future can either be given a defined position after the incumbent fields, or identified with a single alphanumeric character/byte at the start of the line. Here's the sample data:

    123456789012
    AB-11-9876
    6011000995500000
    1218123
    0
    TOM JONES
    123 MAIN STREET
    New York
    NY
    US
    55555
    Y
    E
    n
    Y
    Y

    Brotli reduces this from 107 to 91 bytes, and I don't know how. Pavlov's gzip adds 23 bytes. 7z doubles its size. All the zeros on line 3, and the fives later, should help, but not 16 bytes. Is it just Huffman codes or similar?

  32. #267
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    982
    Thanks
    96
    Thanked 396 Times in 276 Posts
    Quote Originally Posted by SolidComp View Post
    Brotli is the only compressor that is significantly reducing the size of this data.
    Mtcc to 98 bytes:
    https://encode.su/threads/1593-Mtcc-...tes-compressor

  33. Thanks:

    SolidComp (17th June 2020)

  34. #268
    Member
    Join Date
    Mar 2016
    Location
    USA
    Posts
    56
    Thanks
    7
    Thanked 23 Times in 15 Posts
    Quote Originally Posted by SolidComp View Post
    I ditched JSON and protobuf and came up with a Positional Data Syntax (PDS), which is super simple and approaches or beats binary even though it's still readable text.
    If you're always limited to ASCII, a simple packing eliminating the high bit of every byte would get you to 94 bytes. If you're not always limited to ASCII, but other characters are rare, you could use a packed version of UTF-7 encoding (or better yet, a 7-bit UTF-8-style prefix code without self-synchronization, and maybe RLE thrown in) .
    Last edited by MegaByte; 17th June 2020 at 05:19.

  35. #269
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    841
    Thanks
    241
    Thanked 308 Times in 184 Posts
    Quote Originally Posted by MegaByte View Post
    If you're always limited to ASCII, a simple packing eliminating the high bit of every byte would get you to 94 bytes. If you're not always limited to ASCII, but other characters are rare, you could use a packed version of UTF-7 encoding (or better yet, a 7-bit UTF-8-style prefix code without self-synchronization, and maybe RLE thrown in) .
    Brotli can define a 7 bit encoding with very very few bits. Just current encoders don't attempt a simple entropy coding without LZ77.

    A 7-bit code would 128 7s and 128 0s. A "complex entropy code" as described in https://tools.ietf.org/html/rfc7932#section-3.5

    This means that the entropy code of the entropy code of the entropy code could have a single a single symbol, i.e., 7 and at code length 1. This can be expressed with

    HSKIP = 0b11 (skip three values)

    value 0 0b00 (for code 4)

    value 0 0b00 (for code 0)

    value 0 0b00 (for code 5)

    value 0 0b00 (for code 17)

    value 0 0b00 (for code 6)

    value 0 0b00 (for code 16)

    value 1 0b00 (for code 7)

    value 0 0b00 (for code

    value 0 0b00 (for code 9)

    value 0 0b00 (for code 10)

    value 0 0b00 (for code 11)

    value 0 0b00 (for code 12)

    value 0 0b00 (for code 13)

    value 0 0b00 (for code 14)

    value 0 0b00 (for code 15)

    With this entropy code the number 7 is the only number that can be expressed. Thus the entropy code will have 128 7s causing the entropy code to be full, and automatically close. No further bits read.

    Expressing the most compact 7 bit entropy code in brotli seems to take 3.75 bytes.

    More savings for this kind of small data could be obtained by having a separate entropy code for numbers, space and linefeed etc. and context modeling that entropy code to depend on the previous symbol being number.
    Last edited by Jyrki Alakuijala; 17th June 2020 at 23:06.

  36. #270
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    504
    Thanks
    184
    Thanked 177 Times in 120 Posts
    Zlib gets it to 96 bytes with Z_RLE, 97 with Z_HUFFMAN_ONLY. It just sucks once you add a gzip wrapper around it too.

Page 9 of 10 FirstFirst ... 78910 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •