Results 1 to 8 of 8

Thread: shar2 zstd archiver

  1. #1
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,982
    Thanks
    298
    Thanked 1,309 Times in 745 Posts

    shar2 zstd archiver

    https://github.com/Shelwien/shar2/tree/master/src
    https://github.com/Shelwien/shar2/re...v0/shar2_v0.7z (there's also x64 linux binary)

    This is an implementation of a "different" archive format.
    It has a solid index at the end of archive like .7z/.arc, but still is able to extract from a pipe,
    by extracting to temp names first, and then renaming when index is reached.
    It also doesn't store file sizes explicitly (uses escape-sequences for terminators),
    so its possible to eg. batch-edit files like this: "shar a0 - src/ | sed -e s/foobar/foo/ | shar x - new"

    Known problems:

    1. Wildcards are not implemented yet, so its only possible to create archive from a single directory.

    2. Needs newer (2015+?) VS libs for proper support of non-english names on windows.
    Mingw gcc default libs don't have support for locales.

    3. gcc >=8 builds a buggy exe. Builds from gcc7 and earlier work properly, while gcc8+ don't.
    I don't know why yet, but probably again for gcc its too complicated to parse C++ templates.

  2. Thanks (3):

    Bulat Ziganshin (18th November 2019),Cyan (21st November 2019),Hakan Abbas (19th November 2019)

  3. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,982
    Thanks
    298
    Thanked 1,309 Times in 745 Posts
    1. Turned out that gcc8+ problem was "undefined behavior" due to some missing return statements in some dummy functions.

    2. Apparently setlocale(utf8) only works with VS2019 libs. I tested 2015 and 2017 and they can't handle russian filenames in utf8.
    Thing is, "ascii" winapi functions can only work with ANSI and OEM codepages. VS2019 libs handle this by actually always using unicode winapi
    and converting filenames according to locale, while previous versions don't.

  4. #3
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,982
    Thanks
    298
    Thanked 1,309 Times in 745 Posts
    Switched to zstd 1.4.4. Got me a tough debug session, because my wrapper suddenly stopped working.
    Turns out, now it returns "Src size is incorrect" without a rather obscure setPledgedSrcSize call.
    Also "ZSTD_CONTENTSIZE_UNKNOWN" is 4G-1, so now I'm concerned whether it'd be able to compress a >4G stream.
    Is it documented somewhere?
      size_t read = fread( inpbuf, 1,bufsize, f ); 

    ZSTD_CStream* cstream = ZSTD_createCStream();
    ZSTD_parameters params = ZSTD_getParams( compressionLevel, 0, 0 );
    ZSTD_initCStream_advanced( cstream, 0, 0, params, 0 );

    ZSTD_CCtx_setPledgedSrcSize( cstream, ZSTD_CONTENTSIZE_UNKNOWN ); // <<<<<<<<<

    ZSTD_inBuffer input = { inpbuf, read, 0 };
    ZSTD_outBuffer output = { outbuf, bufsize, 0 };
    size_t toRead = ZSTD_compressStream( cstream, &output, &input );

    printf( "%i/%i %i/%i %i [%s]\n", input.pos,input.size, output.pos,output.size, toRead, ZSTD_getErrorName(toRead) );

  5. #4
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    889
    Thanks
    483
    Thanked 279 Times in 119 Posts
    Also "ZSTD_CONTENTSIZE_UNKNOWN" is 4G-1, so now I'm concerned whether it'd be able to compress a >4G stream.
    `ZSTD_CONTENTSIZE_UNKNOWN` is supposed to be a 64-bit value, so it should be much larger than 4 GB.
    But if it's truncated to 32-bit, then it becomes 4 GB - 1.

    This constant and its behavior are documented here : https://github.com/facebook/zstd/blo...ib/zstd.h#L129

    now it returns "Src size is incorrect" without a rather obscure setPledgedSrcSize call.
    `pledgedSrcSize` is the last parameter of `ZSTD_initCStream_advanced()`.
    In the example, it's set to zero, so now the frame expects to be empty.
    Since it's not, the API returns an error code.

    An alternative consists in passing `ZSTD_CONTENTSIZE_UNKNOWN` as the value for last argument.

    Note that `ZSTD_initCStream_advanced()` is on its way out, and is now labelled a deprecated function.
    I would rather recommend to use only stable functions whenever the equivalent is available in the stable API.

    If I do understand correctly what you want to achieve, I would suggest the following set of invocations :

    Code:
      ZSTD_CStream* cstream = ZSTD_createCStream(); 
    
      ZSTD_CCtx_setParameter(cstream, ZSTD_compressionLevel, compressionLevel);  // <<<<< this is the change, it replaces 3 lines
     
      ZSTD_inBuffer input = { inpbuf, read, 0 };
      ZSTD_outBuffer output = { outbuf, bufsize, 0 };
      size_t toRead = ZSTD_compressStream( cstream, &output, &input );

  6. Thanks:

    Shelwien (26th November 2019)

  7. #5
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,982
    Thanks
    298
    Thanked 1,309 Times in 745 Posts
    > `ZSTD_CONTENTSIZE_UNKNOWN` is supposed to be a 64-bit value, so it should be much larger than 4 GB.
    > But if it's truncated to 32-bit, then it becomes 4 GB - 1.

    I've checked, and its actually a problem with debuginfo. The value is a correct 64-bit -1, but 32 bits are printed.
    Code:
    ZSTD_CCtx_setPledgedSrcSize to 4294967295 bytes 
    ZSTD_compressStream2, endOp=0  
    ZSTD_getCParams (cLevel=3) 
    ZSTD_compressStream2 : transparent init stage 
    ZSTD_getCParams (cLevel=3) 
    ZSTD_resetCStream_internal 
    ZSTD_getCParams (cLevel=3) 
    ZSTD_compressBegin_internal: wlog=21 
    ZSTD_resetCCtx_internal: pledgedSrcSize=4294967295, wlog=21
    > `pledgedSrcSize` is the last parameter of `ZSTD_initCStream_advanced()`.
    > In the example, it's set to zero, so now the frame expects to be empty.
    > Since it's not, the API returns an error code.

    Thanks, I've just already forgot everything about details of this wrapper,
    it worked for years, but then suddenly stopped.

    > If I do understand correctly what you want to achieve, I would suggest the following set of invocations :

    I used initCStream_advanced because I'm actually also setting some other
    parameters, like window size, but I guess it can be also done via setParameter() instead.

  8. #6
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    901
    Thanks
    246
    Thanked 326 Times in 199 Posts
    Check out the Shared Brotli effort. It solves the archiving use, too. Section 9 of its IETF draft describes the archiving format.

    https://tools.ietf.org/html/draft-va...t-04#section-9

  9. #7
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    889
    Thanks
    483
    Thanked 279 Times in 119 Posts
    I used initCStream_advanced because I'm actually also setting some other
    parameters, like window size, but I guess it can be also done via setParameter() instead.
    Yes, `ZSTD_CCtx_setParameter()` is the preferred method to set parameters now.
    It's been declared stable since v1.4.0, meaning from now on, it will always be supported.
    All stable parameters are guaranteed to keep same value and same meaning onward.

    Most parameters can be set up with this method (including experimental ones).
    There are a few additional `ZSTD_CCtx_setX()` methods, for parameters that can't be expressed using an `int`,
    for example `ZSTD_CCtx_setPledgedSrcSize()` which takes an `unsigned long long`,
    or `ZSTD_CCtx_setDictionary()` which takes a buffer.

    For more information :
    https://github.com/facebook/zstd/blo...ib/zstd.h#L233


    shar is a pretty great archiver. I remember using the first version with `lz4` on Windows many years ago.
    It worked great !

    Having the source code available on github is a big benefit of this version,
    since any user of the format will be interested in being able to decode it again in the future,
    and the source code is the ultimate "source of truth" from which a decoder can always be rebuilt if needed.

  10. Thanks (2):

    Mike (26th November 2019),Shelwien (26th November 2019)

  11. #8
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,982
    Thanks
    298
    Thanked 1,309 Times in 745 Posts
    > shar is a pretty great archiver. I remember using the first version with `lz4` on Windows many years ago.

    This one is completely different. Old shar is windows-only because of directory scanning API.
    Windows compilers (VS and mingw) don't have a native readdir implementation for some reason.
    Also it was only possible to work with non-english filenames via wchar winapi.

    But VS2019 lib is finally able to correctly work with utf8 filenames, and I found a readdir implementation based on winapi.

    And the archive format is completely different here.
    Old shar had file headers like .tar or .zip, while shar2 keeps solid archive index at the end (like .7z),
    while using some tricks to extract files without index.

  12. Thanks:

    Cyan (27th November 2019)

Similar Threads

  1. ZSTD license
    By Bulat Ziganshin in forum Data Compression
    Replies: 41
    Last Post: 2nd October 2017, 04:42
  2. LzTurbo vs. Oodle vs. Zstd
    By dnd in forum Data Compression
    Replies: 21
    Last Post: 28th June 2017, 23:30
  3. zstd HC
    By inikep in forum Data Compression
    Replies: 10
    Last Post: 7th November 2015, 23:27
  4. B1 Archiver
    By Gonzalo in forum Data Compression
    Replies: 0
    Last Post: 24th November 2014, 21:20
  5. Reasonable Archiver
    By LovePimple in forum Forum Archive
    Replies: 7
    Last Post: 12th March 2008, 15:05

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •