Page 15 of 15 FirstFirst ... 5131415
Results 421 to 446 of 446

Thread: Zstandard

  1. #421
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    349
    Thanks
    131
    Thanked 53 Times in 37 Posts
    I like the new --filelist flag in Zstd 1.4.5. This makes me wonder if it's possible to give Zstd a folder and have it compress all the files in the folder individually (not compress the folder all together). Anyone know?

  2. #422
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    885
    Thanks
    480
    Thanked 278 Times in 118 Posts
    This makes me wonder if it's possible to give Zstd a folder and have it compress all the files in the folder individually
    That's indeed possible :

    Code:
    zstd -r directory/
    https://www.mankier.com/1/zstd#-r

  3. Thanks:

    SolidComp (24th May 2020)

  4. #423
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    981
    Thanks
    96
    Thanked 396 Times in 276 Posts
    enwik10:
    3,638,532,709 bytes, 23.290 sec. - 10.649 sec., zstd -1 --ultra (v1.4.5)
    3,325,793,817 bytes, 32.959 sec. - 11.632 sec., zstd -2 --ultra (v1.4.5)
    3,137,188,839 bytes, 42.442 sec. - 11.994 sec., zstd -3 --ultra (v1.4.5)
    3,072,048,223 bytes, 44.923 sec. - 12.828 sec., zstd -4 --ultra (v1.4.5)
    2,993,531,459 bytes, 72.322 sec. - 12.827 sec., zstd -5 --ultra (v1.4.5)
    2,921,997,106 bytes, 95.852 sec. - 12.613 sec., zstd -6 --ultra (v1.4.5)
    2,819,369,488 bytes, 132.442 sec. - 11.922 sec., zstd -7 --ultra (v1.4.5)
    2,780,718,316 bytes, 168.737 sec. - 11.724 sec., zstd -8 --ultra (v1.4.5)
    2,750,214,835 bytes, 237.175 sec. - 11.574 sec., zstd -9 --ultra (v1.4.5)
    2,694,582,971 bytes, 283.778 sec. - 11.564 sec., zstd -10 --ultra (v1.4.5)
    2,669,751,039 bytes, 355.330 sec. - 11.651 sec., zstd -11 --ultra (v1.4.5)
    2,645,099,063 bytes, 539.770 sec. - 11.658 sec., zstd -12 --ultra (v1.4.5)
    2,614,435,940 bytes, 717.361 sec. - 11.766 sec., zstd -13 --ultra (v1.4.5)
    2,569,453,043 bytes, 894.063 sec. - 11.872 sec., zstd -14 --ultra (v1.4.5)
    2,539,608,782 bytes, 1,198.939 sec. - 11.795 sec., zstd -15 --ultra (v1.4.5)
    2,450,374,607 bytes, 1,397.298 sec. - 11.547 sec., zstd -16 --ultra (v1.4.5)
    2,372,309,135 bytes, 1,994.123 sec. - 11.414 sec., zstd -17 --ultra (v1.4.5)
    2,339,536,175 bytes, 2,401.207 sec. - 11.819 sec., zstd -18 --ultra (v1.4.5)
    2,299,200,392 bytes, 3,093.583 sec. - 12.295 sec., zstd -19 --ultra (v1.4.5)
    2,196,998,753 bytes, 3,838.985 sec. - 12.952 sec., zstd -20 --ultra (v1.4.5)
    2,136,031,972 bytes, 4,488.867 sec. - 13.171 sec., zstd -21 --ultra (v1.4.5)
    2,079,998,491 bytes, 5,129.788 sec. - 12.915 sec., zstd -22 --ultra (v1.4.5)

    enwik10:
    3,642,089,943 bytes, 28.752 sec. - 10.717 sec., zstd -1 --ultra --single-thread (v1.4.5)
    3,336,007,957 bytes, 38.991 sec. - 11.808 sec., zstd -2 --ultra --single-thread (v1.4.5)
    3,133,763,440 bytes, 48.671 sec. - 12.157 sec., zstd -3 --ultra --single-thread (v1.4.5)
    3,065,081,662 bytes, 50.724 sec. - 12.904 sec., zstd -4 --ultra --single-thread (v1.4.5)
    2,988,125,022 bytes, 79.664 sec. - 13.073 sec., zstd -5 --ultra --single-thread (v1.4.5)
    2,915,934,603 bytes, 103.971 sec. - 12.798 sec., zstd -6 --ultra --single-thread (v1.4.5)
    2,811,448,067 bytes, 148.300 sec. - 12.125 sec., zstd -7 --ultra --single-thread (v1.4.5)
    2,775,621,897 bytes, 188.946 sec. - 11.804 sec., zstd -8 --ultra --single-thread (v1.4.5)
    2,744,751,362 bytes, 255.285 sec. - 11.929 sec., zstd -9 --ultra --single-thread (v1.4.5)
    2,690,272,721 bytes, 304.380 sec. - 11.737 sec., zstd -10 --ultra --single-thread (v1.4.5)
    2,663,964,945 bytes, 380.876 sec. - 11.848 sec., zstd -11 --ultra --single-thread (v1.4.5)
    2,639,230,515 bytes, 561.791 sec. - 11.774 sec., zstd -12 --ultra --single-thread (v1.4.5)
    2,609,728,690 bytes, 705.747 sec. - 11.646 sec., zstd -13 --ultra --single-thread (v1.4.5)
    2,561,381,234 bytes, 896.689 sec. - 11.777 sec., zstd -14 --ultra --single-thread (v1.4.5)
    2,527,193,467 bytes, 1,227.455 sec. - 11.893 sec., zstd -15 --ultra --single-thread (v1.4.5)
    2,447,614,045 bytes, 1,360.777 sec. - 11.614 sec., zstd -16 --ultra --single-thread (v1.4.5)
    2,370,639,588 bytes, 1,953.282 sec. - 11.641 sec., zstd -17 --ultra --single-thread (v1.4.5)
    2,337,506,087 bytes, 2,411.038 sec. - 11.971 sec., zstd -18 --ultra --single-thread (v1.4.5)
    2,299,225,559 bytes, 2,889.098 sec. - 12.184 sec., zstd -19 --ultra --single-thread (v1.4.5)
    2,197,171,322 bytes, 3,477.477 sec. - 12.862 sec., zstd -20 --ultra --single-thread (v1.4.5)
    2,136,340,302 bytes, 4,024.675 sec. - 12.940 sec., zstd -21 --ultra --single-thread (v1.4.5)
    2,080,479,075 bytes, 4,568.550 sec. - 12.934 sec., zstd -22 --ultra --single-thread (v1.4.5)

    Difference zstd x --ultra --single-thread (v1.4.4) versus (v1.4.5):
    zstd -1, 0.168 sec., -0.137 sec., -501 bytes
    zstd -2, 0.389 sec., -0.610 sec., -87 bytes
    zstd -3, 0.021 sec., -0.490 sec., 0 bytes
    zstd -4, 0.029 sec., -0.441 sec., 2 bytes
    zstd -5, 1.127 sec., -0.204 sec., 0 bytes
    zstd -6, 1.427 sec., -0.504 sec., 0 bytes
    zstd -7, 4.892 sec., -0.257 sec., 0 bytes
    zstd -8, 2.735 sec., -0.202 sec., 0 bytes
    zstd -9, 2.494 sec., -0.008 sec., 0 bytes
    zstd -10, 3.077 sec., -0.181 sec., 0 bytes
    zstd -11, 3.802 sec., -0.165 sec., 0 bytes
    zstd -12, 0.884 sec., -0.158 sec., 0 bytes
    zstd -13, 0.184 sec., -0.146 sec., 0 bytes
    zstd -14, -2.345 sec., -0.114 sec., 0 bytes
    zstd -15, -3.486 sec., -0.124 sec., 0 bytes
    zstd -16, -3.167 sec., -0.102 sec., 0 bytes
    zstd -17, 15.255 sec., -0.041 sec., 0 bytes
    zstd -18, 39.073 sec., -0.147 sec., 0 bytes
    zstd -19, 16.304 sec., -0.151 sec., 0 bytes
    zstd -20, 17.468 sec., 0.119 sec., 0 bytes
    zstd -21, 9.179 sec., 0.014 sec., 0 bytes
    zstd -22, 10.418 sec., -0.163 sec., 0 bytes

    Minus (-) = improvement
    Last edited by Sportman; 24th May 2020 at 13:38.

  5. Thanks (3):

    Cyan (24th May 2020),Hakan Abbas (25th May 2020),SolidComp (24th May 2020)

  6. #424
    Member ivan2k2's Avatar
    Join Date
    Nov 2012
    Location
    Russia
    Posts
    42
    Thanks
    14
    Thanked 7 Times in 4 Posts
    error, if files in filelist separated by windows style new line (0x0d,0x0a):
    Code:
    Error : util.c, 283 : fgets(buf, (int) len, file)
    unix style new line works fine.

  7. Thanks:

    Cyan (24th May 2020)

  8. #425
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    349
    Thanks
    131
    Thanked 53 Times in 37 Posts
    Sportman, why is the compressed size different for single thread vs multithreaded? Is it supposed to produce different results? I thought it would be deterministic at any given compression level.

  9. #426
    Member
    Join Date
    May 2017
    Location
    United States
    Posts
    10
    Thanks
    3
    Thanked 6 Times in 4 Posts
    Quote Originally Posted by SolidComp View Post
    Sportman, why is the compressed size different for single thread vs multithreaded? Is it supposed to produce different results? I thought it would be deterministic at any given compression level.
    Both single-thread and multi-thread modes are deterministic, but they produce different results. Multi-threaded compression produces the same output with any number of threads. The zstd cli defaults to multi-threaded compression with 1 worker thread. You can opt into single-thread compression with --single-thread.

  10. Thanks (2):

    Cyan (25th May 2020),Piglet (25th May 2020)

  11. #427
    Member
    Join Date
    May 2019
    Location
    Japan
    Posts
    26
    Thanks
    4
    Thanked 8 Times in 4 Posts
    zstd is now updated to 1.4.5 in my benchmark: http://kirr.dyndns.org/sequence-compression-benchmark/

    I noticed good improvement in decompression speed for all levels, and some improvement in compression speed for slower levels. (Though I am updating from 1.4.0, so the improvement may be larger than from 1.4.4).

  12. Thanks (3):

    Cyan (25th May 2020),Hakan Abbas (25th May 2020),SolidComp (25th May 2020)

  13. #428
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    349
    Thanks
    131
    Thanked 53 Times in 37 Posts
    Quote Originally Posted by Kirr View Post
    zstd is now updated to 1.4.5 in my benchmark: http://kirr.dyndns.org/sequence-compression-benchmark/

    I noticed good improvement in decompression speed for all levels, and some improvement in compression speed for slower levels. (Though I am updating from 1.4.0, so the improvement may be larger than from 1.4.4).
    Do you build the compressors from source, or do you use the builds provided by the projects?

  14. #429
    Member
    Join Date
    May 2019
    Location
    Japan
    Posts
    26
    Thanks
    4
    Thanked 8 Times in 4 Posts
    Quote Originally Posted by SolidComp View Post
    Do you build the compressors from source, or do you use the builds provided by the projects?
    From source, when possible. Thanks, will clarify it on website (in the next update).

  15. #430
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    885
    Thanks
    480
    Thanked 278 Times in 118 Posts
    Can someone explain the new "patch size" feature in Zstd 1.4.5?
    --patch-from is a new capability designed to reduce the size of transmitted data when updating a file from one version to another.

    In this model, it is assumed that :
    - the old version is present at destination site
    - new and old versions are relatively similar, with only a handful of changes. If that's the case, the compression ratio will be ridiculously good.

    zstd will see the old version as a "dictionary" when generating the patch and when decompressing the new version.
    So it's not a new format : the patch is a regular zstd compressed file.

  16. #431
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,564
    Thanks
    775
    Thanked 687 Times in 372 Posts
    Cyan, do you compared it against xdelta and similar algos?

  17. #432
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,912
    Thanks
    291
    Thanked 1,272 Times in 719 Posts
    I asked FitGirl to test it... got this:
    Code:
    1056507088	d2_game2_003.00               // (1) game data
    1383948734	d2_game2_003.resources        // (2) precomp output
     327523769	d2_game2_003.resources.x5     // xdelta -5
     245798553	d2_game2_003.resources.x5.8   // compressed
     278021923	d2_game2_003.resources.zsp    // zstd -patch
     247363158	d2_game2_003.resources.zsp.8  // compressed
    Speed-wise zstd patching seems good, but it has a 2G window limit, MT support for this is unknown,
    and overall specialized patchers seem to work better.

  18. #433
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    885
    Thanks
    480
    Thanked 278 Times in 118 Posts
    Cyan, do you compared it against xdelta and similar algos?
    So far, we have only thoroughly compared with bsdiff
    We can certainly extend the comparison to more products, to get a more complete picture.


    MT support for this is unknown
    MT support for --patch-from works just fine.


    In term of positioning, zstd is trying to bring speed to formula : fast generation of patches, fast application of patches.
    There are use cases which need speed and will like this trade-off, compared to more established solutions which tend to be less flexible in term of range of speed.

    At this stage, we don't try to claim "best" patch size.
    There are a few scenarios where zstd can be quite competitive, but that's not always the case.

    This first release will hopefully help us understand what are users's expectations, in order to select the next batch of improvements.
    This is a new territory for us, there is still plenty of room for improvements, both feature and performance wise.

    One unclear aspect to me is how much benefit could achieve a dedicated diff engine (as opposed to recycling our "normal" search engine) while preserving the zstd format.
    There are, most likely, some limitations introduced by the format, since it wasn't created with this purpose in mind.
    But how much comes from the format, as opposed to the engine ?
    This part is unclear to me.
    Currently, I suspect that the most important limitations come from the engine, hence better patch sizes should be possible.

  19. Thanks (2):

    hexagone (29th May 2020),Shelwien (29th May 2020)

  20. #434
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,564
    Thanks
    775
    Thanked 687 Times in 372 Posts
    we may need a separate topic, but my little insight is the following:

    in image compression, we have 2D model and try to predict each pixel using data from left and above. in video, we even have 3rd dimension (previous frame)

    but general compression usually limited to 1D, although repeated distances and literal masking added a tiny bit of 2nd dimension to the LZ data model

    patching is natural 2D model - rather than considering it as "ORIGINAL MODIFIED", you should look at it as

    ORIGINAL
    MODIFIED

    this changes the model for LZ back references - we should keep "current pointer" in the ORIGINAL data and try to encode each reference relative to this pointer. It will reduce enocded reference size and thus allow to reference smaller strings from the ORIGINAL data. Also, we can use masked literals, i.e. use "corresponding byte" as the context for encoding the current one


    Knowledge that we are patching also should allow faster match search. Each time the previous match ends, we have

    1) current byte in the MODIFIED data
    2) "current byte" in the ORIGINAL data
    3) last actually used byte in the ORIGINAL data

    So, we suppose that the next match may have srcpos near 2 or 3 and dstpos at 1 or a bit later. So we may look around for smaller matches (2-3 bytes) before going to full-scale search

  21. #435
    Member
    Join Date
    Jun 2018
    Location
    Yugoslavia
    Posts
    54
    Thanks
    8
    Thanked 3 Times in 3 Posts
    what would be best way to create dictionary for log file, such as these from spamassasin:

    Code:
    Oct 19 03:42:59 localhost spamd[13905]: spamd: connection from blabla.bla.com [ip.add.re.ss]:61395 to port 1783, fd 5
    Oct 19 03:42:59 localhost spamd[13905]: spamd: checking message <0OY10MRFLRL00@blablafake.com> for (unknown):101
    Oct 19 03:43:00 localhost spamd[14032]: spamd: clean message (3.3/8.0) for (unknown):101 in 2.0 seconds, 8848 bytes.
    Oct 19 03:43:00 localhost spamd[14032]: spamd: result: . 3 - DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HTML_MESSAGE,MIME_HTML_ONLY,MISSING_FROM,RDNS_NONE scantime=2.0,size=8848,user=(unknown),uid=101,required_score=8.0,rhost=blablafake.com,raddr=ip.add.re.ss,rport=45995,mid=<b9a461d565a@blabla.com>,autolearn=no autolearn_force=no
    how to manually create dictionary?

  22. #436
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    885
    Thanks
    480
    Thanked 278 Times in 118 Posts
    what would be best way to create dictionary for log file
    This all depends on storage strategy.
    Dictionary is primarily useful when there are tons of small files.

    But if the log lines are just appended into a single file, as is often the case,
    then just compress the file normally,
    it will likely compress very well.

  23. Thanks:

    pklat (6th June 2020)

  24. #437
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    780
    Thanks
    240
    Thanked 249 Times in 153 Posts
    Zstd was added to ZIP a few days ago with method ID 20:
    https://pkware.cachefly.net/webdocs/...NOTE-6.3.7.TXT
    https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT
    4.4.5 compression method: (2 bytes)

    0 - The file is stored (no compression)
    1 - The file is Shrunk
    2 - The file is Reduced with compression factor 1
    3 - The file is Reduced with compression factor 2
    4 - The file is Reduced with compression factor 3
    5 - The file is Reduced with compression factor 4
    6 - The file is Imploded
    7 - Reserved for Tokenizing compression algorithm
    8 - The file is Deflated
    9 - Enhanced Deflating using Deflate64(tm)
    10 - PKWARE Data Compression Library Imploding (old IBM TERSE)
    11 - Reserved by PKWARE
    12 - File is compressed using BZIP2 algorithm
    13 - Reserved by PKWARE
    14 - LZMA
    15 - Reserved by PKWARE
    16 - IBM z/OS CMPSC Compression
    17 - Reserved by PKWARE
    18 - File is compressed using IBM TERSE (new)
    19 - IBM LZ77 z Architecture
    20 - Zstandard (zstd) Compression
    96 - JPEG variant
    97 - WavPack compressed data
    98 - PPMd version I, Rev 1
    99 - AE-x encryption marker (see APPENDIX E)


  25. Thanks:

    Mike (6th June 2020)

  26. #438
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,912
    Thanks
    291
    Thanked 1,272 Times in 719 Posts
    Incidentally it was also added by winzip as method 93: https://support.winzip.com/hc/en-us/...ted-Zip-format

  27. #439
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    349
    Thanks
    131
    Thanked 53 Times in 37 Posts
    Has anyone benchmarked Zstd on small files using the custom dictionary feature? It would be extremely interesting to see how it does, especially at higher compression levels.

    This article made me wonder how it would fare for that Expedia use case and similar scenarios: “A case study about compression and binary formats for a REST service” by Juliano Julio Costa https://link.medium.com/C2zZ96T876

    They thought they had a winner in LZMA2, but they found that its on-the-fly performance cost made gzip the winner.

  28. #440
    Member JamesWasil's Avatar
    Join Date
    Dec 2017
    Location
    Arizona
    Posts
    78
    Thanks
    80
    Thanked 13 Times in 12 Posts
    Quote Originally Posted by Cyan View Post
    > Zstd has an excellent dictionary feature that could be leverage to create a web standard static dictionary.

    We do agree.
    We measured fairly substantial compression ratio improvements by using this technique and applying it to general websites,
    achieving much better compression ratio than any other technique available today, using a small set of static dictionaries.
    (Gains are even better when using site-dedicated dictionaries, but that's a different topic.)



    > Anyone know if a web dictionary is already being worked on?

    Investing time on this topic only makes sense if at least one significant browser manufacturer is willing to ship it. This is a very short list.

    Hence we discussed the topic with Mozilla, even going as far as inviting them to control the decision process regarding dictionaries' content.
    One can follow it here : https://github.com/mozilla/standards...ons/issues/105


    Since this strategy failed, maybe working in the other direction would make better sense :
    produce an initial set of static dictionaries, publish them as candidates, demonstrate the benefits with clear measurements,
    then try to get browsers on board, possibly create a fork as a substantial demo.

    The main issue is that it's a lot of work.
    Who has enough time to invest in this direction, knowing that even if benefits are clearly and unquestionably established,
    one should be prepared to see this work dismissed with hand-waved comments because "not invented here" ?

    So I guess the team investing time on this topic should better have strong relations with relevant parties,
    such as Mozilla, Chrome and of course the Web Committee.
    It might be best to go about it from the standpoint of making browser extensions that compress and decompress the data and are identified by the server of supporting web hosts and/or web developers who develop their sites in such a way that they can identify whether or not the browser has the plug-in for ZSTD-HTML or ZSTD-PHP (as 2 examples) installed, and whether they are enabled. If they are, the server can send the compressed version of it to the user rather than having to be reliant upon Mozilla to integrate it with the browser itself.

    Akin to Adobe Flash in that way, the alternative or normal, uncompressed content can be served if the browser ID or plug-in is either not detected or enabled on the user's end. This will gradually help to build the standard and the following both from the corporate and business level as well as the home/end-user one.

    The more people use it, the more popular and important it will become to where Mozilla will be then contacting you to implement the standard, rather than you and yours having to jump hoops and sail oceans just to get them to take notice and make things happen now.

    Appealing to their competition might be just as useful as this, since the Chrome team for Google or even Opera may decide they want to one-up Mozilla and any other contenders by having native support for it. That, and the fact that Google and Opera have deals with independent freeware and shareware authors to bundle those browsers with their installations and downloads, that would be an effective way to go around Mozilla not taking notice today, and make them wish they had and go out of their way to take notice and implement it tomorrow as a standard.

    That said, a devoted team to maintain the database for html/xml/php/javascript and the other web scripts and languages would need to be a joint effort that is handled and provides updates to each of the browsers and organizations that are on board with it.

    Preliminary development could use a traditional database that isn't updated regularly but is more than adequate for most business and personal use to see compression advantages for their content, either by storage gains, speed gains, or both.

    After it becomes popular or frequently used, there should be no problem having people support it, develop for it, or contribute to it once there are incentives for them to do that.

    I remember how long it took for Java and Adobe Flash to be taken seriously, and then once it was (despite the security issues which stifled that later), it took off and everyone was using it. Everyone was developing for it and all the browsers supported it either directly or via plug-in.

    Even after the security issues, people still have it around on systems sparingly if they ever need to use it or load a page that has it still. Fortunately, you won't have to worry about that issue since there isn't content execution or container/virtualization issues with decompression to the browser.

    Whether they acknowledge it today or see it as better than say gzip/compressed as a recognized extension to handle compressed pages won't matter if you can build the following and use for it any other way.

    Maybe Facebook can help with that since they are using ZSTD and LZ4 today to their benefit? Maybe it would benefit them too (and get Mozilla to really take notice) approaching it from that direction instead?

    One of the commenters from the Mozilla Standards link said they felt that it wasn't sufficiently different enough from Brotli to be considered for inclusion. My answer to that would be to add a special ZSTD for the web then that IS sufficiently different enough from it (but still backwards compatible with standard ZSTD) to where they can no longer say that and will see the advantages using it if features are added which specialize in advantages when used online.

    This could be anything from speed optimizations and greater compression ratio to support for server to server streaming, specific content inclusion that is unusual but ideal (such as 360 degree photo compression or things that enhance speed or efficiency for Cloudflare, Akamai, etc), and other bells and whistles that aren't there with Brotli that would be useful or advantageous to them.

    But even without that angle, I would definitely approach it as a web plugin or browser extension with or without their approval, that way it can be actively used and beneficial to gain ground for presentation again later once it has the user base it needs for them to see the merit of it.

    James

    P.S:It's expected that people would add a version name or number or change the name entirely to showcase the new design or system. I've seen people make funnies about ZSTD being an std lol, but I'm on the fence as to whether or not calling it Z-COVID v19 or something else would be better or worse for attention to it. I'd probably steer clear of renaming it like that. Maybe ZSTD-Web v1.x would be better.

  29. #441
    Member JamesWasil's Avatar
    Join Date
    Dec 2017
    Location
    Arizona
    Posts
    78
    Thanks
    80
    Thanked 13 Times in 12 Posts
    (One other thing I was going to mention is this:

    If you start it out as an optional web plugin or browser extension, then not only do several of the valid arguments that people had about webmasters using compression and it being mandatory become non-issues, but the latter native extension that Mozilla or others may want to build as a part of the browser can then pull from the same database.

    That database that they both pull from will be actively used and updated and while available, will remain entirely optional to where nothing is ever forced.

    This ensures cross-compatibility and uniform use to where everyone can enjoy the benefits with or without approval on their own terms.

    It makes it easier to maintain security over any type of exploits that might be possible with the dictionaries, too. If it always comes from a single trusted source that may or not be editable but is always viewable by the end user, then the editing or exploitation of things becomes harder when it is known and visible but not accessible except for personal use or manual updating that requires user or developer interaction to perform.)

  30. #442
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    349
    Thanks
    131
    Thanked 53 Times in 37 Posts
    What kind of data formats can Zstd train a dictionary on? Any format?

    More specifically, I want to use CSV files to train some dictionaries for compressing JSON payloads. For example, I want to use CSV files of the most common American last names to train a dictionary, then use the dictionary to compress JSON payloads that include some of those names. Should this work?

    I think the dictionary needs to see repeated strings right? So one CSV file of names in the training set shouldn't work. I assume I have to duplicate the CSV a couple of times...

  31. #443
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    349
    Thanks
    131
    Thanked 53 Times in 37 Posts
    I'm confused by discussions of streaming on the GitHub page. I thought that only brotli had streaming, and Zstd did not. What's the situation? I thought Jyrki had said something at one point about how with brotli you can decode content as it comes in, as a stream, so you can have the head of an HTML file, for example, before the rest comes in. Is this "streaming" or something else? We also have this feature with gzip, but I thought with Zstd you needed the whole file to do anything with it. Has this changed?

  32. #444
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    885
    Thanks
    480
    Thanked 278 Times in 118 Posts
    What kind of data formats can Zstd train a dictionary on? Any format?
    Yes, any format

    I want to use CSV files to train some dictionaries for compressing JSON payloads.
    It's better to train on JSON payloads in order to compress JSON payloads.

    I want to use CSV files of the most common American last names to train a dictionary
    (...)
    I think the dictionary needs to see repeated strings right?
    It's only one part of the process.
    If you attempt this method to produce a dictionary, you will probably get some benefits, but hardly the best benefits.

    Don't try to second guess what the trainer is doing. It's hard enough to explain,
    and more importantly, we don't want to commit to any specific method, as it would hamper our ability to improve the trainer in the future.
    What matters is the methodology : use samples representative of the type of data you want to compress.
    Raw samples, without pre-filter or digestion of any form, work best.

  33. Thanks:

    SolidComp (21st June 2020)

  34. #445
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    885
    Thanks
    480
    Thanked 278 Times in 118 Posts
    I thought that only brotli had streaming, and Zstd did not. What's the situation?
    (...) I thought with Zstd you needed the whole file to do anything with it. Has this changed?
    Zstandard had streaming capability all along.

    It was first introduced in development version `v0.4.0`, and received its final API format in `v0.8.1`.
    `zstd` is (among other things) used to compress very large files, like SSD backups.
    Thankfully, it doesn't require the entire content in memory, as one can guess, this would have been quite problematic.

  35. #446
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,912
    Thanks
    291
    Thanked 1,272 Times in 719 Posts
    > Jyrki had said something at one point about how with brotli you can decode content as it comes in,
    > as a stream, so you can have the head of an HTML file, for example, before the rest comes in.

    Jyrki talked about decoding delay, how much data you need to start receiving uncompressed bytes.
    zstd needs a whole block (in terms of format structure, its not a specific number of bytes) to start decoding,
    while brotli (and many other codecs: deflate, lzma, ppmd) can incrementally decode something from any number of received bytes.

    In practice it doesn't matter at all, since most of networking software only decodes complete chunks -
    partial decoding is much more troublesome for validation etc.

Page 15 of 15 FirstFirst ... 5131415

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •