Page 16 of 16 FirstFirst ... 6141516
Results 451 to 459 of 459

Thread: Zstandard

  1. #451
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,593
    Thanks
    802
    Thanked 698 Times in 378 Posts
    sorry, I can't edit the post. I thought that Cyan ansered me, but it seems that he answered algorithm

  2. #452
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    892
    Thanks
    492
    Thanked 280 Times in 120 Posts
    Yes, the comment referred to the suggested hash function.

    Indeed, the `lz4` hash is different, using a double-shift instead.

    Since mixing of high bit seems a bit worse for the existing `lz4` hash function,
    it would implied that the newly proposed hash should perform better (better spread).
    And that's not too difficult to check : replace one line of code, and run on a benchmark corpus (important : have many different files of different types).

    Quite quickly, it appears that this is not the case.
    The "new" hash function (a relative of which used to be present in older `lz4` versions), doesn't compress better, in spite of the presumed better mixing.

    At least, not always, and not predictably.
    I can find a few cases where it compresses better : x-ray (1.010->1.038), ooffice (1.414 -> 1.454),
    but there are also counter examples : mr (1.833 -> 1.761), samba (2.800 -> 2.736), or nci (6.064->5.686).
    So, on first approximation, differences are mostly attributed to "noise".

    I believe a reason for this outcome is that the 12-bit hash table is already over-saturated,
    so it doesn't matter that a hash function has "better" mixing:
    all positions in the hash are already in use and will be overwritten before their distance limit.
    Any "reasonably correct" hash is good enough with regards to this lossy scheme (1-slot hash table).

    So, why selecting one instead of the other ?
    Well, speed becomes the next differentiator.

    And in this regards, according to my tests, there is really no competition :
    the double shift variant is much faster than the mask variant.
    I measure a 20% speed difference between the two, variable depending on source file, but always to the benefit of the double shift variant.

    I suspect the speed advantage is triggered by more than just the instructions spent for the hash itself.
    It seems to "blend" better with the rest of the match search,
    maybe due to instruction density, re-use of intermediate registers, or impact on match search pattern.

    Whatever the reason, the different is large enough to tilt the comparison in favor of the double-shift variant.

  3. #453
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,593
    Thanks
    802
    Thanked 698 Times in 378 Posts
    Thank you for extremely quick check. For me, the speed loss looks counter-intuitive - on Intel CPUs, AND can be performed on any of 4 ALUs, while SHL can be perfomed only on 2 ALUs, so AND shouldn't be any worse

    May be it will be different on other non-ARM cpus, in particular AMD Zen

  4. #454
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    821
    Thanks
    253
    Thanked 266 Times in 164 Posts
    Zstandard v1.4.7

    Repository: facebook/zstd · Tag: v1.4.7 · Commit: 645a297 · Released by: Cyan4973
    v1.4.7 unleashes several months of improvements across many axis, from performance to various fixes, to new capabilities, of which a few are highlighted below. It’s a recommended upgrade.
    (Note: if you ever wondered what happened to v1.4.6, it’s an internal release number reserved for synchronization with Linux Kernel)
    Improved --long mode

    --long mode makes it possible to analyze vast quantities of data in reasonable time and memory budget. The --long mode algorithm runs on top of the regular match finder, and both contribute to the final compressed outcome.
    However, the fact that these 2 stages were working independently resulted in minor discrepancies at highest compression levels, where the cost of each decision must be carefully monitored. For this reason, in situations where the input is not a good fit for --long mode (no large repetition at long distance), enabling it could reduce compression performance, even if by very little, compared to not enabling it (at high compression levels). This situation made it more difficult to "just always enable" the --long mode by default.
    This is fixed in this version. For compression levels 16 and up, usage of --long will now never regress compared to compression without --long. This property made it possible to ramp up --long mode contribution to the compression mix, improving its effectiveness.
    The compression ratio improvements are most notable when --long mode is actually useful. In particular, --patch-from (which implicitly relies on --long) shows excellent gains from the improvements. We present some brief results here (tested on Macbook Pro 16“, i9).
    Since --long mode is now always beneficial at high compression levels, it’s now automatically enabled for any window size >= 128MB and up.
    Faster decompression of small blocks

    This release includes optimizations that significantly speed up decompression of small blocks and small data. The decompression speed gains will vary based on the block size according to the table below:
    Block Size | Decompression Speed Improvement
    -----------|--------------------------------

    1 KB | ~+30%

    2 KB | ~+30%

    4 KB | ~+25%

    8 KB | ~+15%

    16 KB | ~+10%

    32 KB | ~+5%
    These optimizations come from improving the process of reading the block header, and building the Huffman and FSE decoding tables. zstd’s default block size is 128 KB, and at this block size the time spent decompressing the data dominates the time spent reading the block header and building the decoding tables. But, as blocks become smaller, the cost of reading the block header and building decoding tables becomes more prominent.
    CLI improvements

    The CLI received several noticeable upgrades with this version.
    To begin with, zstd can accept a new parameter through environment variable, ZSTD_NBTHREADS . It’s useful when zstd is called behind an application (tar, or a python script for example). Also, users which prefer multithreaded compression by default can now set a desired nb of threads with their environment. This setting can still be overridden on demand via command line.
    A new command --output-dir-mirror makes it possible to compress a directory containing subdirectories (typically with -r command) producing one compressed file per source file, and reproduce the arborescence into a selected destination directory.
    There are other various improvements, such as more accurate warning and error messages, full equivalence between conventions --long-command=FILE and --long-command FILE, fixed confusion risks between stdin and user prompt, or between console output and status message, as well as a new short execution summary when processing multiple files, cumulatively contributing to a nicer command line experience.
    New experimental features

    Shared Thread Pool

    By default, each compression context can be set to use a maximum nb of threads.
    In complex scenarios, there might be multiple compression contexts, working in parallel, and each using some nb of threads. In such cases, it might be desirable to control the total nb of threads used by all these compression contexts altogether.
    This is now possible, by making all these compression contexts share the same threadpool. This capability is expressed thanks to a new advanced compression parameter, ZSTD_CCtx_refThreadPool(), contributed by @marxin. See its documentation for more details.
    Faster Dictionary Compression

    This release introduces a new experimental dictionary compression algorithm, applicable to mid-range compression levels, employing strategies such as ZSTD_greedy, ZSTD_lazy, and ZSTD_lazy2. This new algorithm can be triggered by selecting the compression parameter ZSTD_c_enableDedicatedDictSearch during ZSTD_CDict creation (experimental section).
    Benchmarks show the new algorithm providing significant compression speed gains :
    No code has to be inserted here.
    We hope it will help making mid-levels compression more attractive for dictionary scenarios. See the documentation for more details. Feedback is welcome!
    New Sequence Ingestion API

    We introduce a new entry point, ZSTD_compressSequences(), which makes it possible for users to define their own sequences, by whatever mechanism they prefer, and present them to this new entry point, which will generate a single zstd-compressed frame, based on provided sequences.
    So for example, users can now feed to the function an array of externally generated ZSTD_Sequence:
    [(offset: 5, matchLength: 4, litLength: 10), (offset: 7, matchLength: 6, litLength: 3), ...] and the function will output a zstd compressed frame based on these sequences.
    This experimental API has currently several limitations (and its relevant params exist in the “experimental” section). Notably, this API currently ignores any repeat offsets provided, instead always recalculating them on the fly. Additionally, there is no way to forcibly specify existence of certain zstd features, such as RLE or raw blocks.
    If you are interested in this new entry point, please refer to zstd.h for more detailed usage instructions.
    Changelog

    There are many other features and improvements in this release, and since we can’t highlight them all, they are listed below:

    • perf: stronger --long mode at high compression levels, by @senhuang42
    • perf: stronger --patch-from at high compression levels, thanks to --long improvements
    • perf: faster decompression speed for small blocks, by @terrelln
    • perf: faster dictionary compression at medium compression levels, by @felixhandte
    • perf: small speed & memory usage improvements for ZSTD_compress2(), by @terrelln
    • perf: minor generic decompression speed improvements, by @helloguo
    • perf: improved fast compression speeds with Visual Studio, by @animalize
    • cli : Set nb of threads with environment variable ZSTD_NBTHREADS, by @senhuang42
    • cli : new --output-dir-mirror DIR command, by @xxie24 (#2219)
    • cli : accept decompressing files with *.zstd suffix
    • cli : --patch-from can compress stdin when used with --stream-size, by @bimbashrestha (#2206)
    • cli : provide a condensed summary by default when processing multiple files
    • cli : fix : stdin input can no longer be confused with user prompt
    • cli : fix : console output no longer mixes stdout and status messages
    • cli : improve accuracy of several error messages
    • api : new sequence ingestion API, by @senhuang42
    • api : shared thread pool: control total nb of threads used by multiple compression jobs, by @marxin
    • api : new ZSTD_getDictID_fromCDict(), by @LuAPi
    • api : zlibWrapper only uses public API, and is compatible with dynamic library, by @terrelln
    • api : fix : multithreaded compression has predictable output even in special cases (see #2327) (issue not present on cli)
    • api : fix : dictionary compression correctly respects dictionary compression level (see #2303) (issue not present on cli)
    • api : fix : return dstSize_tooSmall error whenever appropriate
    • api : fix : ZSTD_initCStream_advanced() with static allocation and no dictionary
    • build: fix cmake script when employing path including spaces, by @terrelln
    • build: new ZSTD_NO_INTRINSICS macro to avoid explicit intrinsics
    • build: new STATIC_BMI2 macro for compile time detection of BMI2 on MSVC, by @Niadb (#2258)
    • build: improved compile-time detection of aarch64/neon platforms, by @bsdimp
    • build: Fix building on AIX 5.1, by @likema
    • build: compile paramgrill with cmake on Windows, requested by @mirh
    • build: install pkg-config file with CMake and MinGW, by @tonytheodore (#2183)
    • build: Install DLL with CMake on Windows, by @BioDataAnalysis (#2221)
    • build: fix : cli compilation with uclibc
    • misc: Improve single file library and include dictBuilder, by @cwoffenden
    • misc: Fix single file library compilation with Emscripten, by @yoshihitoh (#2227)
    • misc: Add freestanding translation script in contrib/freestanding_lib, by @terrelln
    • doc : clarify repcode updates in format specification, by @felixhandte

  5. Thanks (5):

    avitar (22nd December 2020),Bulat Ziganshin (17th December 2020),Cyan (18th December 2020),JamesB (18th December 2020),Shelwien (17th December 2020)

  6. #455
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    4,138
    Thanks
    320
    Thanked 1,402 Times in 804 Posts
    i7-7820X @ 4.5Ghz
    mingw gcc 8.2 -O9 -m64 -march=skylake
    zstd 1.45 vs zstd 1.47 enwik8 blockwise compression test (32kb blocks):
    http://nishi.dreamhosters.com/u/zstd147.html

  7. Thanks (4):

    Bulat Ziganshin (19th December 2020),Cyan (18th December 2020),Jarek (18th December 2020),Mike (17th December 2020)

  8. #456
    Member
    Join Date
    Jun 2018
    Location
    Yugoslavia
    Posts
    82
    Thanks
    8
    Thanked 6 Times in 6 Posts
    so its worse? you didn't specify units.

  9. #457
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    892
    Thanks
    492
    Thanked 280 Times in 120 Posts
    The decompression speed improvements for small blocks use several combined techniques,
    one of which is to constrain the entropy header in a way which is faster to decode,
    although it's slightly less accurate, and therefore costs a little bit of compression ratio.

    This is probably what is observed in @Shelwien's scenario.

    The optimization is mostly focused on 4 KB blocks, which is a typical quantity for a ram or storage page.
    It's also designed to be broadly applicable to blocks <= 16 KB, which should benefit most database designs.
    ~32 KB is about the limit where the technique may apply or not, depending on circumstances (compressibility being one factor).

  10. Thanks:

    Jarek (18th December 2020)

  11. #458
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    4,138
    Thanks
    320
    Thanked 1,402 Times in 804 Posts
    Added MB/s units.
    Decoding is ~7% faster, encoding is faster for levels 5-12, compression negligibly worse (~0.004%)

  12. #459
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    821
    Thanks
    253
    Thanked 266 Times in 164 Posts
    https://www.phoronix.com/scan.php?pa...1.4.9-Released
    https://www.reddit.com/r/linux/comme...long_distance/


    Zstandard v1.4.9

    Repository: facebook/zstd · Tag: v1.4.9 · Commit: e4558ff · Released by: felixhandte
    This is an incremental release which includes various improvements and bug-fixes.
    >2x Faster Long Distance Mode

    Long Distance Mode (LDM) --long just got a whole lot faster thanks to optimizations by @mpu in #2483! These optimizations preserve the compression ratio but drastically speed up compression. It is especially noticeable in multithreaded mode, because the long distance match finder is not parallelized. Benchmarking with zstd -T0 -1 --long=31 on an Intel I9-9900K at 3.2 GHz we see:
    No code has to be inserted here.
    * linux-versions is a concatenation of the linux 4.0, 5.0, and 5.10 git archives.
    New Experimental Decompression Feature: ZSTD_d_refMultipleDDicts

    If the advanced parameter ZSTD_d_refMultipleDDicts is enabled, then multiple calls to ZSTD_refDDict() will be honored in the corresponding DCtx. Example usage:
    ZSTD_DCtx* dctx = ZSTD_createDCtx();
    ZSTD_DCtx_setParameter(dctx, ZSTD_d_refMultipleDDicts, ZSTD_rmd_refMultipleDDicts);
    ZSTD_DCtx_refDDict(dctx, ddict1);
    ZSTD_DCtx_refDDict(dctx, ddict2);
    ZSTD_DCtx_refDDict(dctx, ddict3);
    ...
    ZSTD_decompress...
    Decompression of multiple frames, each with their own dictID, is now possible with a single ZSTD_decompress call. As long as the dictID from each frame header references one of the dictIDs within the DCtx, then the corresponding dictionary will be used to decompress that particular frame. Note that this feature is disabled with a statically-allocated DCtx.
    Changelog



    —This release has 8 assets:

    • zstd-1.4.9.tar.gz
    • zstd-1.4.9.tar.gz.sha256
    • zstd-1.4.9.tar.zst
    • zstd-1.4.9.tar.zst.sha256
    • zstd-v1.4.9-win32.zip
    • zstd-v1.4.9-win64.zip
    • Source code (zip)
    • Source code (tar.gz)

    Visit the release page to download them.

Page 16 of 16 FirstFirst ... 6141516

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •