Page 14 of 14 FirstFirst ... 4121314
Results 391 to 401 of 401

Thread: Zstandard

  1. #391
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    899
    Thanks
    84
    Thanked 325 Times in 227 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    Note that m/t also affects level 21
    Confirmed:

    2,136,340,302 bytes, 4,604.696 sec. - 15.121 sec., zstd -21 --ultra --single-thread (v1.4.4)

  2. #392
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    870
    Thanks
    471
    Thanked 264 Times in 109 Posts
    Thanks @Sportman for exposing this use case thanks to your new benchmark !
    It made it possible to identify the issue, and @terrelln basically nailed it in a new patch.
    This issue will be fixed in the next release (v1.4.5).

  3. Thanks:

    Sportman (18th January 2020)

  4. #393
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    899
    Thanks
    84
    Thanked 325 Times in 227 Posts
    2,337,506,087 bytes, 2,596.459 sec. - 13.159 sec., zstd -18 --single-thread (v1.4.4)
    2,299,225,559 bytes, 3,196.329 sec. - 13.559 sec., zstd -19 --single-thread (v1.4.4)
    2,197,171,322 bytes, 3,947.477 sec. - 14.471 sec., zstd -20 --ultra --single-thread (v1.4.4)
    Last edited by Sportman; 18th January 2020 at 04:01.

  5. Thanks:

    Cyan (18th January 2020)

  6. #394
    Member
    Join Date
    May 2017
    Location
    United States
    Posts
    9
    Thanks
    3
    Thanked 4 Times in 3 Posts
    Thanks for the benchmarks sportman! The multithreading issue is fixed in the latest dev branch, and we now mention --single-thread in the help.

  7. Thanks (2):

    JamesB (20th January 2020),Sportman (18th January 2020)

  8. #395
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    899
    Thanks
    84
    Thanked 325 Times in 227 Posts
    Cyan and Terrelln thanks for picking up and fixing this issue so fast, can't wait till the official release to see if there are more files with till 10% storage and till 8% compression speed savings in big data production environments with zstd ultra modes.

  9. #396
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    245
    Thanks
    100
    Thanked 48 Times in 32 Posts
    I'm seeing rumblings about browser support for Zstd. What's the status?

    There's an opportunity here to achieve much better compression of HTML, CSS, JS, and SVG content. I mean better than brotli and of course gzip. The opportunity lies in building a great dictionary. None of the benchmarks I've seen for Zstd employ a dictionary. Zstd has an excellent dictionary feature that could be leverage to create a web standard static dictionary.

    Brotli has a static dictionary, but it isn't a great one. The strings are too short, it doesn't support modern HTML and JS features/keywords (because it was generated off of old HTML files), and it has a lot of strange entries, like "pittsburgh" (but not for example "Los Angeles", which is much more common) and "CIA World Factbook", which is an extremely rare string. The biggest opportunity with Zstd is to create a static dictionary that would standardize certain conventions and strings in HTML, CSS, and JS source (especially HTML). It could be developed in conjunction with a web minification standard, which we've long needed. A dictionary could then guide and optimize content creation, CMSes, and so forth. For example, it could standardize strings like the beginning of an HTML file as
    Code:
    <!DOCTYPE html><html lang=
    CMSes and minifiers could then implement the standardized strings and minification conventions. There could be hundreds of such standardized strings...

    Anyone know if a web dictionary is already being worked on?

  10. Thanks:

    Cyan (2nd April 2020)

  11. #397
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    768
    Thanks
    217
    Thanked 286 Times in 168 Posts
    Quote Originally Posted by SolidComp View Post
    I'm seeing rumblings about browser support for Zstd. What's the status?

    There's an opportunity here to achieve much better compression of HTML, CSS, JS, and SVG content. I mean better than brotli and of course gzip.
    I'm not aware of a single file of HTML, CSS, JS or SVG where Zstd compresses more than brotli. Even if brotli's internal static dictionary is disabled, brotli still wins these tests by 5 % in density.

    Quote Originally Posted by SolidComp View Post
    The opportunity lies in building a great dictionary. None of the benchmarks I've seen for Zstd employ a dictionary. Zstd has an excellent dictionary feature that could be leverage to create a web standard static dictionary.
    The Zstd dictionaries can be used with brotli, too.

    Shared brotli supports a more flexible custom dictionary format than either Zstd or normal Brotli, where the word transforms are available .

    Quote Originally Posted by SolidComp View Post
    Brotli has a static dictionary, but it isn't a great one.
    It is better than any dictionary I have seen so far.

    While people can read the dictionary and speculate, no one has actually made an experiment that shows better performance with a similarly sized dictionary on general purpose web benchmark. I suspect it is very difficult to do and certainly impossible with a simple LZ custom dictionary such as zstd's. Brotli's word based dictionary saves about 2 bits per word reference due to the more compact representation by addressing words instead of being able to address between of words or combinations of consequent words or combinations of word fragments.

    We can achieve 50 % more compression for specific use cases, but the web didn't change that much that the brotli dictionary would need an update. Actually, the brotli dictionary is well suited for compressing human communications from 200 years ago and will likely work in another 200 years.

    Quote Originally Posted by SolidComp View Post
    Anyone know if a web dictionary is already being worked on?
    https://datatracker.ietf.org/doc/dra...brotli-format/

  12. Thanks:

    SolidComp (Yesterday)

  13. #398
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    870
    Thanks
    471
    Thanked 264 Times in 109 Posts
    > Zstd has an excellent dictionary feature that could be leverage to create a web standard static dictionary.

    We do agree.
    We measured fairly substantial compression ratio improvements by using this technique and applying it to general websites,
    achieving much better compression ratio than any other technique available today, using a small set of static dictionaries.
    (Gains are even better when using site-dedicated dictionaries, but that's a different topic.)



    > Anyone know if a web dictionary is already being worked on?

    Investing time on this topic only makes sense if at least one significant browser manufacturer is willing to ship it. This is a very short list.

    Hence we discussed the topic with Mozilla, even going as far as inviting them to control the decision process regarding dictionaries' content.
    One can follow it here : https://github.com/mozilla/standards...ons/issues/105


    Since this strategy failed, maybe working in the other direction would make better sense :
    produce an initial set of static dictionaries, publish them as candidates, demonstrate the benefits with clear measurements,
    then try to get browsers on board, possibly create a fork as a substantial demo.

    The main issue is that it's a lot of work.
    Who has enough time to invest in this direction, knowing that even if benefits are clearly and unquestionably established,
    one should be prepared to see this work dismissed with hand-waved comments because "not invented here" ?

    So I guess the team investing time on this topic should better have strong relations with relevant parties,
    such as Mozilla, Chrome and of course the Web Committee.
    Last edited by Cyan; 2nd April 2020 at 09:02.

  14. Thanks (3):

    Jarek (Yesterday),Mike (2nd April 2020),SolidComp (Yesterday)

  15. #399
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    245
    Thanks
    100
    Thanked 48 Times in 32 Posts
    Jyrki, everything you said is correct, but you're missing something. One of my core ideas here is a prospective dictionary. Your dictionary is retrospective — it was generated based on old or existing HTML source. As a result, it doesn't support modern and near future HTML source very well.

    A prospective dictionary has two advantages:


    1. Efficient encoding of modern and foreseeable near future syntax.
    2. The ability to shape and influence syntax conventions as publishers optimize for the dictionary – a feedback loop.


    In reality a prospective dictionary of the sort I'm advocating would be a combination of prospective and retrospective.

    The tests you're asking for are nearly impossible since by definition a prospective dictionary isn't based on existing source files, but rather is intended to standardize and shape future source files. The kind of syntax a good prospective dictionary could include are things like:

    Code:
    
    <meta name="viewport" content="width=device-width"> (an increasingly common bit of code)
    <link rel="dns-prefetch" href="https://
    <link rel="preconnect" href="https://
    <link crossorigin="anonymous" media="all" rel="stylesheet" href="https://
    


    These are all nice and chunky strings, and they all represent syntax that could be standardized for the next decade or so.
    If they were present in the brotli or Zstd dictionary, publishers would then optimize their source to include these strings in this exact standardized form.
    What I mean is having the rel before the href and so forth.
    Note that these are just a few examples. A dictionary built this way would be a lot more efficient than the status quo.



  16. #400
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    768
    Thanks
    217
    Thanked 286 Times in 168 Posts
    Quote Originally Posted by SolidComp View Post
    As a result, it doesn't support modern and near future HTML source very well. ... A dictionary built this way would be a lot more efficient than the status quo.
    Perhaps you are more focused on aesthetics and elegance than efficiency. Efficiency is something that can be measured in a benchmark, not by reasoning. As an example when I played with dictionary generation (both zstd --train and shared brotli), occasionally I found that taking 10 random samples of the data and finding the best sample as a dictionary turned out more efficient than running either of the more expensive dictionary extraction algorithm. Other times concatenating 10 random samples was a decent strategy. It is not necessary for thorough thinking, logic and beauty to 'win' the dictionary efficiency game.

    Depending on how well the integration of a shared dictionary has been done, different 'rotting' times can be observed. SDCH dictionaries were rotting every 3-6 months into being mostly useless or already slightly harmful, with brotli dictionaries we barely see rot at all. Zstd dictionaries use -- while less efficient than shared brotli style shared dictionary coding -- also likely rots much slower than SDCH dictionaries. This is because SDCH used to mix the entropy originating from the dictionary use with the literals in the data, and then hope that a general purpose compressor can make sense out of this bit porridge.

    IMHO, we could come up with a unified way to do shared dictionaries and use it across the simple compressors (like zstd and brotli).

  17. #401
    Member
    Join Date
    Jun 2018
    Location
    Yugoslavia
    Posts
    45
    Thanks
    6
    Thanked 1 Time in 1 Post
    You might then move to some entirely new format, more efficient than html and compress that.

    Does brotli pack each file separately? If so, huge gains could be made if all text files for that page were solid packed in one. iirc, time is mostly wasted on fetching multitude of small files.

Page 14 of 14 FirstFirst ... 4121314

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •