Page 3 of 12 FirstFirst 12345 ... LastLast
Results 61 to 90 of 340

Thread: Zstandard

  1. #61
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    863
    Thanks
    458
    Thanked 257 Times in 105 Posts
    Zstandard just received an important update (v0.2)
    mainly focused on faster decompression better.

    https://github.com/Cyan4973/zstd/releases
    Last edited by Cyan; 23rd October 2015 at 17:32.

  2. The Following 8 Users Say Thank You to Cyan For This Useful Post:

    Bulat Ziganshin (28th October 2015),inikep (22nd October 2015),Jarek (22nd October 2015),jethro (24th October 2015),Jiwei Ke (22nd October 2015),Jyrki Alakuijala (23rd October 2015),Stephan Busch (22nd October 2015),tobijdc (28th October 2015)

  3. #62
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    863
    Thanks
    458
    Thanked 257 Times in 105 Posts
    Zstandard just reached v0.4.

    The major focus of this release is to provide High Compression modes from command line (they were previously only accessible through API).
    All compression levels will notice significant ratio improvements on larger files, with the strongest ones benefiting most.

    Source code is available at :
    https://github.com/Cyan4973/zstd/releases

    There is also a pre-compiled Windows 64 binary available.

  4. The Following 7 Users Say Thank You to Cyan For This Useful Post:

    Bulat Ziganshin (29th November 2015),comp1 (29th November 2015),inikep (29th November 2015),jethro (29th November 2015),Stephan Busch (29th November 2015),tobijdc (29th November 2015),Turtle (29th November 2015)

  5. #63
    Programmer
    Join Date
    May 2008
    Location
    PL
    Posts
    307
    Thanks
    68
    Thanked 166 Times in 63 Posts
    It seems that you have lost 20% of decompression speed in v0.4:

    Code:
    | zstd_HC v0.3.6 level 1      |   250 MB/s |   529 MB/s |    51230550 | 48.86 |
    | zstd_HC v0.3.6 level 2      |   186 MB/s |   498 MB/s |    49678572 | 47.38 |
    | zstd_HC v0.3.6 level 3      |    90 MB/s |   484 MB/s |    48838293 | 46.58 |
    | zstd_HC v0.3.6 level 4      |    75 MB/s |   474 MB/s |    48423913 | 46.18 |
    | zstd_HC v0.3.6 level 5      |    61 MB/s |   467 MB/s |    46480999 | 44.33 |
    | zstd_HC v0.3.6 level 6      |    40 MB/s |   477 MB/s |    45723093 | 43.60 |
    | zstd_HC v0.3.6 level 7      |    28 MB/s |   480 MB/s |    44803941 | 42.73 |
    | zstd_HC v0.3.6 level 8      |    21 MB/s |   475 MB/s |    44511976 | 42.45 |
    | zstd_HC v0.3.6 level 9      |    15 MB/s |   497 MB/s |    43899996 | 41.87 |
    | zstd_HC v0.3.6 level 10     |    16 MB/s |   493 MB/s |    43845344 | 41.81 |
    | zstd_HC v0.3.6 level 11     |    15 MB/s |   491 MB/s |    42506862 | 40.54 |
    | zstd_HC v0.3.6 level 12     |    11 MB/s |   493 MB/s |    42402232 | 40.44 |
    | zstd v0.4 level 1           |   244 MB/s |   492 MB/s |    51160301 | 48.79 |
    | zstd v0.4 level 2           |   176 MB/s |   443 MB/s |    49719335 | 47.42 |
    | zstd v0.4 level 3           |    88 MB/s |   422 MB/s |    48749022 | 46.49 |
    | zstd v0.4 level 4           |    74 MB/s |   402 MB/s |    48352259 | 46.11 |
    | zstd v0.4 level 5           |    69 MB/s |   387 MB/s |    46389082 | 44.24 |
    | zstd v0.4 level 6           |    36 MB/s |   387 MB/s |    45525313 | 43.42 |
    | zstd v0.4 level 7           |    29 MB/s |   390 MB/s |    44805120 | 42.73 |
    | zstd v0.4 level 8           |    23 MB/s |   389 MB/s |    44509894 | 42.45 |
    | zstd v0.4 level 9           |    16 MB/s |   402 MB/s |    43892280 | 41.86 |
    | zstd v0.4 level 10          |    18 MB/s |   407 MB/s |    43807530 | 41.78 |
    | zstd v0.4 level 11          |    15 MB/s |   417 MB/s |    42498160 | 40.53 |
    | zstd v0.4 level 12          |    11 MB/s |   406 MB/s |    42394424 | 40.43 |

  6. The Following 2 Users Say Thank You to inikep For This Useful Post:

    m^3 (30th November 2015),tobijdc (30th November 2015)

  7. #64
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    863
    Thanks
    458
    Thanked 257 Times in 105 Posts
    The differences in decompression speed are related to differences in instruction alignment decision made by the compiler.

    The issue was mostly observed with gcc on x64 target, although some other compilers could be affected. Intel recent line x64 cpu is the most affected (sandy bridge and beyond) since it's tied to its instruction fetch hardware implementation.

    The main problem is, from one compiler to another, using different version, parameters and system library, instruction alignment will be different with same source code.
    And when doing a modification in an unrelated place of the code, it will change alignment decisions later on, in the decompression routines, with positive or negative consequences. A real nightmare when trying to optimize and measure.

    I haven't found a way to reliably force the compiler to make correct alignment decisions.
    `-falign-loops=32` only works on small dense libraries (such as huff0), not for a full program such as `zstd`.

    So my second best guess is to generate pgo-assisted builds.
    In which case, it correctly detects which loops are important, and must be correctly aligned for speed.
    Results tend to be better, but they can unfortunately change a bit each time, because the pgo runtime measures can be a little different.
    Moreover, while pgo-assisted builds can be automated within Makefile, it's a Makefile only solution, meaning that programs integrating directly source file (like lzbench) will not benefit from it. So it's only a partial solution.


    Anyway, zstd 0.4.1 is out, and tries to help this issue, both by modifying the decompression routine in a way which seems more positive than negative on a bunch of tested platforms, and by proposing pgo-assisted zstd build.


    Regards

  8. The Following User Says Thank You to Cyan For This Useful Post:

    inikep (1st December 2015)

  9. #65
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    Quote Originally Posted by Cyan View Post
    I haven't found a way to reliably force the compiler to make correct alignment decisions.
    `-falign-loops=32` only works on small dense libraries (such as huff0), not for a full program such as `zstd`.
    I'm not completely sure what you're referring to, but if you're talking about data you declare IIRC you can use __attribute__(__aligned__(32)) to make sure the data is 32-byte aligned (obviously other values will work too). There might be something in OpenMP if you want something more portable, but I don't know it off the top of my head.

    If your concern is really loops and you have aligned data but the compiler doesn't know it, you could use the SIMD support in OpenMP 4.0+ (it doesn't even require linking to openmp; gcc has -fopenmp-simd, I think clang does too). IIRC `#pragma omp simd aligned(variable_name:32)` should do the trick.

    If you don't know whether the input data is aligned or not, I think the best you're going to be able to do is have both versions and a runtime check + branch.

    FWIW OpenMP 4's SIMD stuff is really quite cool. I learned it a few years ago, but never really had a use for it and ended up forgetting most of it

  10. #66
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    863
    Thanks
    458
    Thanked 257 Times in 105 Posts
    if you're talking about data you declare IIRC you can use __attribute__(__aligned__(32))
    Unfortunately, this is not the issue.

    The problem is instruction alignment property, a much rarer category. It only happens when the algorithm is so densely optimized that the hardware instruction prefetcher becomes the bottleneck.

    There is a fairly good description of the issue here :
    http://pzemtsov.github.io/2014/05/12...rformance.html

  11. The Following 2 Users Say Thank You to Cyan For This Useful Post:

    Bulat Ziganshin (2nd December 2015),nemequ (2nd December 2015)

  12. #67
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    Quote Originally Posted by Cyan View Post
    Unfortunately, this is not the issue.

    The problem is instruction alignment property, a much rarer category. It only happens when the algorithm is so densely optimized that the hardware instruction prefetcher becomes the bottleneck.

    There is a fairly good description of the issue here :
    http://pzemtsov.github.io/2014/05/12...rformance.html
    Ah, interesting problem to have . Thanks for the link.

    Have you tried putting __attribute__((__aligned__(32))), and maybe __attribute__((__hot__)), on individual functions? AFAIK there is no way to specify attributes on loops, but it seems like putting them on functions would be better than nothing. You could also try puting the loops in their own (static, maybe static inline) functions and add the attributes there… If you do that there is also the pure, and maybe const, attribute.
    Last edited by nemequ; 2nd December 2015 at 04:26.

  13. The Following 2 Users Say Thank You to nemequ For This Useful Post:

    Bulat Ziganshin (2nd December 2015),Cyan (2nd December 2015)

  14. #68
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    863
    Thanks
    458
    Thanked 257 Times in 105 Posts
    Good point !
    Thanks nemequ, I didn't know the __hot__ attribute.
    Interesting to test....

  15. #69
    Programmer
    Join Date
    May 2008
    Location
    PL
    Posts
    307
    Thanks
    68
    Thanked 166 Times in 63 Posts
    v0.4.1 works great for me:
    Code:
    zstd_HC v0.3.6 level 1         263 MB/s    544 MB/s     51230550  48.86
    zstd_HC v0.3.6 level 2         195 MB/s    525 MB/s     49678572  47.38
    zstd_HC v0.3.6 level 3          97 MB/s    513 MB/s     48838293  46.58
    zstd_HC v0.3.6 level 4          81 MB/s    508 MB/s     48423913  46.18
    zstd_HC v0.3.6 level 5          72 MB/s    489 MB/s     46480999  44.33
    zstd v0.4.1 level 1            256 MB/s    568 MB/s     51160301  48.79
    zstd v0.4.1 level 2            186 MB/s    531 MB/s     49719335  47.42
    zstd v0.4.1 level 3             95 MB/s    521 MB/s     48749022  46.49
    zstd v0.4.1 level 4             78 MB/s    514 MB/s     48352259  46.11
    zstd v0.4.1 level 5             77 MB/s    484 MB/s     46389082  44.24
    zstd v0.4.1 level 6             39 MB/s    493 MB/s     45525313  43.42
    zstd v0.4.1 level 7             34 MB/s    501 MB/s     44805120  42.73
    zstd v0.4.1 level 8             25 MB/s    509 MB/s     44509894  42.45
    zstd v0.4.1 level 9             17 MB/s    525 MB/s     43892280  41.86
    zstd v0.4.1 level 10            19 MB/s    524 MB/s     43807530  41.78
    zstd v0.4.1 level 11            16 MB/s    521 MB/s     42498160  40.53
    zstd v0.4.1 level 12            13 MB/s    525 MB/s     42394424  40.43
    zstd v0.4.1 level 13            10 MB/s    527 MB/s     42321163  40.36
    zstd v0.4.1 level 14            10 MB/s    529 MB/s     42286879  40.33
    zstd v0.4.1 level 15          8.79 MB/s    514 MB/s     42258368  40.30

  16. The Following User Says Thank You to inikep For This Useful Post:

    Cyan (2nd December 2015)

  17. #70
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by Cyan View Post
    I haven't found a way to reliably force the compiler to make correct alignment decisions.
    `-falign-loops=32` only works on small dense libraries (such as huff0), not for a full program such as `zstd`.
    Why do you have to apply it to the full program? Have you tried compiling just the file that needs it with that option?

    Quote Originally Posted by nemequ View Post
    Ah, interesting problem to have . Thanks for the link.

    Have you tried putting __attribute__((__aligned__(32))), and maybe __attribute__((__hot__)), on individual functions? AFAIK there is no way to specify attributes on loops, but it seems like putting them on functions would be better than nothing. You could also try puting the loops in their own (static, maybe static inline) functions and add the attributes there… If you do that there is also the pure, and maybe const, attribute.
    Recent versions of gcc also have the attribute "optimize", which lets you set optimization options for single functions:

    optimize
    The optimize attribute is used to specify that a function is to be compiled with different optimization options than specified on the command line. Arguments can either be numbers or strings. Numbers are assumed to be an optimization level. Strings that begin with O are assumed to be an optimization option, while other options are assumed to be used with a -f prefix. You can also use the ‘#pragma GCC optimize’ pragma to set the optimization options that affect more than one function. See Function Specific Option Pragmas, for details about the ‘#pragma GCC optimize’ pragma.
    This can be used for instance to have frequently-executed functions compiled with more aggressive optimization options that produce faster and larger code, while other functions can be compiled with less aggressive options.

  18. The Following 2 Users Say Thank You to nburns For This Useful Post:

    Bulat Ziganshin (3rd December 2015),nemequ (3rd December 2015)

  19. #71
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    Quote Originally Posted by nburns View Post
    Why do you have to apply it to the full program? Have you tried compiling just the file that needs it with that option?
    Why do you have to apply it to the full file? Why not just the function?

    This has the major advantage that people embedding zstd in their projects benefit from it for free. It also keeps the build system simpler which, again, benefits people embedding zstd. IMHO it's usually best to try to keep the build system out of it if you can… pre-defined macros to detect the compiler, OS, or libc, coupled with a bit of compiler-specific magic (pragmas, attributes, etc.) usually let you keep things portable without embedding too much knowledge in your build system.

    Quote Originally Posted by nburns View Post
    Recent versions of gcc also have the attribute "optimize", which lets you set optimization options for single functions:
    Last time I checked, the optimization options you could provide using the optimize attribute were pretty limited. More than once I've had to resort to creating separate files to be able to choose the right optimized version of a function at runtime. OTOH, when it works it is quite nice.

  20. #72
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by nemequ View Post
    Why do you have to apply it to the full file? Why not just the function?

    This has the major advantage that people embedding zstd in their projects benefit from it for free. It also keeps the build system simpler which, again, benefits people embedding zstd. IMHO it's usually best to try to keep the build system out of it if you can… pre-defined macros to detect the compiler, OS, or libc, coupled with a bit of compiler-specific magic (pragmas, attributes, etc.) usually let you keep things portable without embedding too much knowledge in your build system.
    The only possible ways I see to apply -falign-loops=32 at lower than file level are the optimize function attribute and #pragma GCC optimize. They don't seem to have existed prior to gcc 4.4.7. If they work, and you don't mind adding the compiler dependency, then go for it.

    OTOH, the command-line option seems more fool-proof.

    Last time I checked, the optimization options you could provide using the optimize attribute were pretty limited. More than once I've had to resort to creating separate files to be able to choose the right optimized version of a function at runtime. OTOH, when it works it is quite nice.
    I haven't tried it, so I can't promise it will work.

  21. #73
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    863
    Thanks
    458
    Thanked 257 Times in 105 Posts
    I've tried to use align-loops at pragma optimize level.
    But it doesn't work (no effect).

    Indeed, function level optimization control would be ideal.

  22. #74
    Member
    Join Date
    Dec 2015
    Location
    Russia
    Posts
    18
    Thanks
    2
    Thanked 1 Time in 1 Post
    Have two small Windows .bat scripts for batch compression\decompression files to\from zst format.
    Its need to post here with details?
    Last edited by Vanfear; 3rd December 2015 at 13:56.

  23. #75
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    863
    Thanks
    458
    Thanked 257 Times in 105 Posts
    Sure.
    Do you mean you are looking for a batch compression/decompression feature from zstd ?
    Last edited by Cyan; 3rd December 2015 at 16:08.

  24. #76
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by Cyan View Post
    I've tried to use align-loops at pragma optimize level.
    But it doesn't work (no effect).
    I can confirm that this does not seem to work (gcc 4.7.2).

  25. #77
    Member
    Join Date
    Dec 2015
    Location
    Russia
    Posts
    18
    Thanks
    2
    Thanked 1 Time in 1 Post
    Quote Originally Posted by Cyan View Post
    Sure.
    Do you mean you are looking for a batch compression/decompression feature from zstd ?
    Yep.

    zstdPack.bat
    Code:
    @for /f "tokens=1" %%k in ('dir "R:\MyFiles\" /a-d-h-s /b') do (zstd.exe -20 "R:\MyFiles\%%k" "R:\Compressed\%%k.zst")
    @pause > nul
    zstdUnpack.bat
    Code:
    @for /f "tokens=1" %%d in ('dir "R:\Compressed\*.zst" /a-d /b') do (zstd.exe -d "R:\Compressed\%%d" "R:\Unpacked\%%~nd")
    @pause > nul
    Notes:
    1. zstd.exe not processing folders(?), so folders processing excluded from batch scripts.
    2. If destination folder not exist, zstd.exe will give error (code 13). Create it manualy or via batch script.
    3. Keys -d-h-s says that directories, hidden and system files will be excluded from processing.

    Files in subfolders excluded from processing by deafult. They can be included by use small packing code correction:
    zstdPackInclSubfiles.bat
    Code:
    @for /f "tokens=1" %%k in ('dir "R:\MyFiles\" /a-d-h-s /b /s') do (zstd.exe -20 "%%k" "R:\Compressed\%%~nxk.zst")
    @pause > nul
    But Note: in this case possibly conflicts of filenames in destination folder, between files from root folder and files from subfolders.
    Last edited by Vanfear; 4th December 2015 at 16:48.

  26. #78
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    863
    Thanks
    458
    Thanked 257 Times in 105 Posts
    > zstd.exe not processing folders(?), so folders processing excluded from batch scripts.

    Correct.
    What about transforming `directory` into `directory/*` (with potentially recursion) ?


    Your script looks pretty good to me

    What I could do to help, from zstd command line utility, is to allow the processing of multiple filenames.
    This should speed up processing when there is a large amount of small files to compress/decompress,
    because it would avoid creating / freeing a process for each file.

    That being said, directory would still remain out of scope, so your scripts would be necessary to handle this scenario.

  27. #79
    Member
    Join Date
    Dec 2015
    Location
    Russia
    Posts
    18
    Thanks
    2
    Thanked 1 Time in 1 Post
    Quote Originally Posted by Cyan View Post
    What about transforming `directory` into `directory/*` (with potentially recursion) ?
    That being said, directory would still remain out of scope, so your scripts would be necessary to handle this scenario.
    Found solution with mirroring source subfolders to destination, zstd.exe work looks as archiver with basic functions.
    Also found mistake in all scripts. Files with spaces in names processing incorrect by script, as result have filenames conflicts when processing by zstd.exe

    Now correcting code, and will update this post after some time today.

    Update:

    Colors in scripts: source, destination, commentary.

    zstdPack4.bat

    xcopy "R:\MyFiles" "R:\Compressed\MyFiles\" /t /e /h *
    for /f "delims=" %%k in ('dir "R:\MyFiles\" /a-d-s /b /s') do (zstd.exe -20 "%%k" "R:\Compressed%%~pnxk.zst") **


    * - Creating structure of source folder and subfolders in destination folder. Hidden folders included, key /h. Destination folder must have same name as source folder.
    ** - All files from source folder and subfolders will be send to zstd.exe for processing. Key -d says that folders will not be send to zstd.exe for processing,
    key -s - system files excluded from processing (hidden h - included, not need to specify), /s - files from subfolders included.
    Code %%~pnxk.zst says that to destination path will be added files and subfiles from source folder with saving their pathes:
    (R:\Compressed%%~pnxk.zst = R:\Compressed\MyFiles\And possibly subfolders\name of source file(s)*.zst).


    Code:
    @xcopy "R:\MyFiles" "R:\Compressed\MyFiles\" /t /e /h
    @for /f "delims=" %%k in ('dir "R:\MyFiles\" /a-d-s /b /s') do (zstd.exe -20 "%%k" "R:\Compressed%%~pnxk.zst")
    @pause > nul
    zstdUnpack4.bat

    xcopy "R:\Compressed\MyFiles" "R:\Unpacked\Compressed\MyFiles\" /t /e /h
    for /f "delims=" %%d in ('dir "R:\Compressed\MyFiles\*.zst" /a-d-s /b /s') do (zstd.exe -d "%%d" "R:\Unpacked%%~pnd") *


    * - Can't exclude subfolders from destination path, so destination folder will be have specified path + part of path of source folder.
    It's not problem and not making problems with processing, but not very good from principled position.


    Code:
    @xcopy "R:\Compressed\MyFiles" "R:\Unpacked\Compressed\MyFiles\" /t /e /h
    @for /f "delims=" %%d in ('dir "R:\Compressed\MyFiles\*.zst" /a-d-s /b /s') do (zstd.exe -d "%%d" "R:\Unpacked%%~pnd")
    @pause > nul
    Notes:

    v2
    *Now scripts processing folders, subfolders and subfiles.
    *Destination folders will be created automaticaly by scripts.
    *Fixed problem with processing files with spaces in name. ("tokens=1" replaced to "delims=")
    v3
    *Corrected source path mask example in unpacking script (string 1) for prevent scoping nearby folders from source path when unpacking script starts.
    v4
    *Corrected source path mask example in unpacking script (string 2) for prevent processing files which can be placed in same directory as source folder.

    If someone wants to use variables for paths, create them yourself.
    Example with variables of paths:


    set sour1=R:\MyFiles
    set sour2=R:\MyFiles\
    set dest1=R:\Compressed\MyFiles\
    set dest2=R:\Compressed
    xcopy "%sour1%" "%dest1%" /t /e /h
    for /f "delims=" %%k in ('dir "%sour2%" /a-d-s /b /s') do (zstd.exe -20 "%%k" "%dest2%%%~pnxk.zst")
    pause

    Updated version of scripts in bundle with zstd.exe
    Attached Files Attached Files
    Last edited by Vanfear; 8th December 2015 at 15:58.

  28. The Following User Says Thank You to Vanfear For This Useful Post:

    Cyan (5th December 2015)

  29. #80
    Member Dimitri's Avatar
    Join Date
    Nov 2015
    Location
    Greece
    Posts
    48
    Thanks
    21
    Thanked 30 Times in 14 Posts
    Actually there is one easier way to do this :P
    simply download lz4 installer for windows from Cyan blog and isntall it rename zstd ~~> lz4 and use it to compress
    Replace the renamed zstd to program folder.

    Hope it helps

    Enjoy

  30. The Following User Says Thank You to Dimitri For This Useful Post:

    Cyan (12th December 2015)

  31. #81
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    zstd -20 decompression fails for silesia/mozilla and silesia/webster with Error 36 : Decoding error : ZSTD_error_corruption_detected
    -1 and -9 are OK.
    I built from 0.4.2 source for 32 bit Windows Vista, gcc 4.8.1 using "make CC=gcc"

  32. #82
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    863
    Thanks
    458
    Thanked 257 Times in 105 Posts
    32-bits version cannot decode files compressed with -20.
    It's limited to -19.
    (I probably should do something to make this setting less accessible, or at least trigger some kind of warning).

    That being said, I would have expected a clearer error message than ZSTD_error_corruption_detected.
    It could be that I only introduced the clearer error message within 0.4.3 ...

  33. #83
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    zstd -18 and -19 give the same error as -20. Also -16 and -17 decompressed output is not identical for nci and webster. -10, -14, -15 work correctly.

    Edit: updated the Silesia benchmark.
    Code:
      Silesia dicke mozil   mr   nci ooff  osdb reym samba  sao webst x-ray  xml Compressor -options
    --------- ----- ----- ---- ----- ---- ----- ---- ----- ---- ----- ----- ---- -------------------
     58301860  3197 16668 3283  2155 2851  3284 1613  4286 5196  9942  5309  512 zstd 0.4.2 -15
     60459254  3351 17156 3360  2360 2891  3380 1724  4503 5228 10595  5368  537 zstd 0.4.2 -9
     73726198  4279 20155 3833  2875 3584  3776 2168  5569 6254 13748  6772  707 zstd 0.4.2 -1
     74570081  4134 20775 3792  3444 3627  3727 2140  5723 6146 13515  6761  780 zstd (original release)
    Last edited by Matt Mahoney; 8th December 2015 at 05:16.

  34. #84
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    863
    Thanks
    458
    Thanked 257 Times in 105 Posts
    Thanks for report Matt

    Finally reproduced it.
    There must be an issue specifically with 32-bits on Windows.

    32-bits binary is regularly tested with continuous integration suite
    (see for example 32-bits tests for v0.4.2 : https://travis-ci.org/Cyan4973/zstd/jobs/94409047)
    and everything seems to work fine, but these tests are completed on Linux.
    I couldn't reproduce the issue there.

    Since mingw doesn't work on my current windows workstation, I use Visual instead.
    "Fortunately", binary produced with Visual 32-bits does indeed produce wrong compressed files at -16 and higher.
    So it's reproducible.
    64-bits binary doesn't have such problem.

    Let's investigate ...

  35. #85
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,498
    Thanks
    741
    Thanked 664 Times in 358 Posts
    for those interested in asm optimizations - http://users.atw.hu/instlatx64/HSWvsBDWvsSKL.txt lists all the commands those latency/throughput was changed in the last 3 cpu generations. it's just the info from intel optimization manual, but with filtered out all those unchanged lines. i think that skylake have the single avx-512 unit combined of pipes 0&1, it's why most avx-256 computation commands now can be issued from both these ports, but not the port 5 (unlike previous designs)

  36. The Following 2 Users Say Thank You to Bulat Ziganshin For This Useful Post:

    Cyan (10th December 2015),Turtle (12th December 2015)

  37. #86
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by Cyan View Post
    I've tried to use align-loops at pragma optimize level.
    But it doesn't work (no effect).

    Indeed, function level optimization control would be ideal.
    Maybe someone should file a bug report/feature request in gcc to get this working.

  38. #87
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    If attribute((hot)) does the trick, I don't see the reason to create anything else.

  39. #88
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by m^2 View Post
    If attribute((hot)) does the trick, I don't see the reason to create anything else.
    I just tried __attribute__((hot)), and it did not align the loops.

  40. #89
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    863
    Thanks
    458
    Thanked 257 Times in 105 Posts
    Maybe someone should file a bug report/feature request in gcc to get this working.
    There's a request opened here :
    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67435

    no formal investigation started so far :
    "These values are normally strait out of the Vendors manuals."
    Last edited by Cyan; 12th December 2015 at 15:10.

  41. The Following User Says Thank You to Cyan For This Useful Post:

    nburns (13th December 2015)

  42. #90
    Member Dimitri's Avatar
    Join Date
    Nov 2015
    Location
    Greece
    Posts
    48
    Thanks
    21
    Thanked 30 Times in 14 Posts
    Windows 7z alternative compression using Cyan's Zstd v4.3 and srep3.93,
    before i get misunderstood all credits for this goes to Cyan and Bulat for zstd and srep.
    the code and regfiles all come from lz4 installation for windows in Cyans blog
    the only thing i have done is pipeline srep with zstd and made some modification to code and regfiles

    why replace 7z with this ?? .. well find out
    Attached Files Attached Files

  43. The Following User Says Thank You to Dimitri For This Useful Post:

    Cyan (13th December 2015)

Page 3 of 12 FirstFirst 12345 ... LastLast

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •