Results 1 to 13 of 13

Thread: Standard compression library API

  1. #1
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts

    Standard compression library API

    There are so many discussions last days circling around the key topic - standard API for compression libraries. Why we need such standard? It will allow to separate development of compression code from development of applications and utility code. I made description of two existing standards - 7-zip and FreeArc, while Evan provided description of Squash:

    Squash


    7-zip sources include two APIs - the first is for archive/compressor formats (such as 7z and xz), and second one is for codecs/filters (such as lzma and bcj). By implementing second API, you can add your compression method into your own build of 7z.exe, or you can compile your code to DLL which can be dropped to any 7-zip installation. Moreover, since other archivers also includes 7z.dll, your codec will work with them too. Example includes PowerAchiver and FreeArc (codecs can be used to create and extract 7z/zip archives, like in 7-zip itself) and RAR (ciodecs can be used only to extract 7z archives). With ~100 millions of 7z.dll installatios, this makes 7-zip API far more supported than any alternative.


    FreeArc has two APIs - first is internal API implemented by its own methods. They are derived from COMPRESSION_METHOD class and provides all the functions that ever required by freearc, unarc, fazip and fa'next projects. This includes:
    - single-shot (buffer-to-buffer) [de]compression API, with option to keep memory, allocated for State, between calls
    - callback-style streaming [de]compression API
    - methods to request/modify dictionary, blocksize, compression mem, decompression mem. This allows to inform user about memory needs of selected compression method, reduce method when computer doesn't have enough memory and even modify method to support extraction on low-end boxes
    - universal "doit" method which allows to implement custom APIs querying or modifying the compression method

    If method doesn't implement buffer compression API, it can be emulated via callback API. OTOH, 4x4 implements m/t callback API for any compressor implementing buffer API (directly or indirectly). This means we can add simple m/t to any compression method.

    Multiple compression methods can be chained. Moreover, compression methods can be combined into tree where each methods may have multiple outputs, each processed by its own subtree of methods, with forthcoming support in fa'next.

    Ciphers are supported as particular case of compression methods.

    Each compression/encryption method may have a string of parameters which is saved into archive, there are auxiliary functions parsing integers, doubles and memory sizes.

    There are 4 client programs employing this API: freearc, unarc, fazip and fa'next. unarc is an open-source .arc decompressor, while fazip is open-source single-file compressor that still lacks fileID and hash checking features, so it can't be used in real environment.

    The second FreeArc API is CLS DLL which supports only parameters and callback-based [de]compression, but extremely simple to implement. There is about dozen of 3rd-party CLS plugins available.

    My goal is to add to CLS full power of existing internal FreeArc API, while keeping simplicity of current CLS design, add features mentioned in neighbouring threads, and incline developers to implement/use resulting API in their projects, so that any compressor developed, may be immediately used in 7-zip, freearc, PA and other real-world tools.
    Last edited by Bulat Ziganshin; 19th February 2017 at 00:37.

  2. #2
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    I want to develop new standard API that meets the following goals:
    - simplicity. Especially simplicity of codec definition, as opposed to simplicity of applications using the API
    - extensibility. It should be possible to add new features by chnaging only particular codec and particular application, without modifications to framework and other codecs/apps. Multiple extensions can coexist together in the same codebase
    - portability. It should be easy to port framework to other languages
    - OTOH performance is almost non-issue. It is enough if API can serve million empty calls per second

    8 years ago i started similar initiative with CLS, but it had some drawbacks and wasn't fully implemented. Now i see how to fix these drawbacks using Squash ideas, so i'm back with refreshed proposal:

    API includes 3 layers: registration, parsing and coding.


    Registration

    If codec is loaded from DLL/SO file, framework tries to execute the function
    void CelsInit (u32 service, CelsCallback* cb, void* ud);

    It is called with the service set to CELS_LOAD constant for dll load, CELS_UNLOAD for dll unload (last operation performed with dll), and may be some other code in future extensions. Callback cb should be used to register methods provided by the DLL:
    // Register CELS compressors from DLL
    void CelsInit (u32 service, CelsCallback* cb, void* ud)
    {
    if (service==CELS_LOAD) {
    cb (ud, CELS_REGISTER, "lzma", LzmaMain, NULL /*LzmaMain self*/);
    cb (ud, CELS_REGISTER, "bcj", BcjMain, NULL /*BcjMain self*/);

    }
    else return CELS_NOT_IMPLEMENTED;
    }




    Code compiled directly into executable (rather than DLL) can register methods with CelsRegister:
    // Register builtin CELS compressor
    static int dummy = CelsRegister ("lzma", LzmaMain, NULL /*LzmaMain self*/);


    If dll lacks the CelsInit function but has CelsMain, framework registers it automatically, using "xxx" as method name for cels-xxx.dll:
    CelsRegister ("xxx", CelsMain, NULL /*CelsMain self*/);



    Parsing

    The main function may be called by framework to get generic information about codec, such as priority and compression support, but its main duty is parsing of corresponding compression method:
    size_t LzmaMain (void* self, u32 service, CelsCallback* cb, void* ud, void* inbuf, u64 insize, void* outbuf, u64 outsize)
    {
    if (service==CELS_PARSE) {
    char** params = (char**)inbuf;
    LZMA_METHOD *method = (LZMA_METHOD*) outbuf;
    method->code = LzmaCode;
    // parse params and store result to the method fields...
    return sizeof(LZMA_METHOD); // return size of parsed method or errcode<0 on error
    }
    else if (service==CELS_GET_PRIORITY) {
    return 0;
    }
    else return CELS_NOT_IMPLEMENTED;
    }



    CelsRegister can register nameless parsers or partial-name parsers, so single parser can be used to deal with multiple codecs:
    CelsRegister ("", LzmaBcjMain, NULL);  // will be called for every compression method, slowing down the entire parsing
    CelsRegister ("aes*", EncyptionMain, NULL); // will be called for aes-128, aes-256 and so on
    CelsRegister ("serpent*", EncyptionMain, NULL);

    The inbuf parameter, passed to the parser, contains a list of method name and parameters build from original string by splitting at ":":
    // if original method looks this way
    char *method = "aes-128:i10:r1000";
    // ... then the inbuf sent to parser will look as
    char **params = {"aes-128", "i10", "r1000", NULL};

    The outbuf parameter contains a fresh buffer 4 KB long (CELS_MAX_METHOD_STRUCTURE_SIZE) - on success parser should fill it with binary structure containg parsed method parameters and return size of this particular structure. This allows to perform parsing without malloc calls. Example of the parsed structure:
    struct LZMA_METHOD
    {
    CelsCodec *Code;
    size_t dict;
    // ...
    }



    Coding

    On successful parse, framework gets a handmade object - binary structure that includes a method to handle all operations on the structure. This method should be called fro [de]compression, querying/modifying method parameters and so on. It should also support two important operations:
    - conversion back to string. Since FreeArc tends to hold methods as strings, it should be used after applying methods modifying the structure, in order to retain these changes. I.e. parse("lzma:1g").SetDict(1<<20).unparse() ==> "lzma:1m"
    - conversion back to string for saving into archive. This is similar to the former, but drops details specific for this particular compression operation. F.e. 4x4 drops limit on compression and I/O buffers, ciphers drop the key (on decompression it should be recomputed from password once again) and so on

    The codec function may look like that:
    size_t LzmaCode (void* self, u32 service, CelsCallback* cb, void* ud, void* inbuf, u64 insize, void* outbuf, u64 outsize)
    {
    LZMA_METHOD *lzma = (LZMA_METHOD*) self;
    if (service==CELS_COMPRESS) {

    cb(ud, CELS_READ, buf, size);
    cb(ud, CELS_WRITE, buf, size);
    }
    else return CELS_NOT_IMPLEMENTED;
    }
    Last edited by Bulat Ziganshin; 22nd February 2017 at 17:10.

  3. Thanks:

    Mike (18th February 2017)

  4. #3
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    The textwall above describes framework from the codec POV - i.e. what it needs to implement. Now lets consider framework from the app POV:

    There are two API levels. High-level API is single-shot, combining parse and use of parsed structure together:

    // Compress data using the callback API and the buffer API
    Cels("lzma:1g", CELS_COMPRESS, callback, self);
    Cels("lzma:1g", CELS_COMPRESS_BUF, inbuf, insize, outbuf, outsize);

    // Query the compression method
    u64 mem = Cels("lzma:1g", CELS_GET_COMPRESSION_MEMORY);

    // Modify the compression method
    char modified_method[CELS_MAX_METHOD_STRINGS_SIZE];
    Cels("lzma:1g", CELS_SET_COMPRESSION_MEMORY, 1<<20, modified_method);
    // modified_method=="lzma:1m"


    Low-level API allows to reuse parsed compression method:

    // Parse the compression method and store parsed form into the lzma[] buffer
    char lzma[CELS_MAX_METHOD_STRUCTURE_SIZE];
    Cels("lzma:1g", CELS_PARSE, lzma);

    // Call the parsed method to compress data
    CallCels(lzma, CELS_COMPRESS, callback, self);
    CallCels(lzma, CELS_COMPRESS_BUF, inbuf, insize, outbuf, outsize);

    // Modify the parsed method
    CallCels(lzma, CELS_SET_COMPRESSION_MEMORY, 1<<20);

    // Unparse the modified method
    char modified_method[CELS_MAX_METHOD_STRINGS_SIZE];
    CallCels(lzma, CELS_UNPARSE, modified_method);
    // modified_method=="lzma:1m"


    And that defines the entire framework. Note that aside of CELS_[UN]LOAD/CELS_REGISTER/CELS_[UN]PARSE, the framework doesn't define any services - CELS_COMPRESS, CELS_READ, CELS_SET_COMPRESSION_MEMORY is just a convention between application and codec. While we will have pretty wide set of such conventions, they don't need to be enforced by the framework itself - all the functionality that the framework provides is to [un]load the DLL, register codecs and [un]parse compression method string into binary structure. The basic framework doesn't povide other features, so it doesn't need to enforce implementation and behavior of other services.

    F.e. auxiliary function that combines multiple compressors together into pipe, can be considered as framework extension relying on that all methods being combined implement the API extension, namely the CELS_COMPRESS service. If any of them don't do it - the function will simply fail.
    Last edited by Bulat Ziganshin; 19th February 2017 at 00:44.

  5. Thanks:

    Mike (18th February 2017)

  6. #4
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    Let's see how it fulfills the desired goals:

    portability. It should be easy to port framework to other languages
    Entire API is described in terms of C functions, integers, pointers and ASCIIZ strings, which meets C ABI feature set. It should be easy for any language, compiled or dynamic, to bind/implement these functions.

    Codec implementation needs to export 2-3 functions with C ABI. Framework binding needs to import another 2-3 functions with C ABI. Plus they both need to define large set of CELS_* constants. I suppose to support the constant list in simple textual form, so translation to the form appropriate for specific language would be a simple sed one-liner.

    So, binding to other language should be a half-hour project. The most complex part probably is wrapping callbacks into C-callable functions. Compatibility with any C/C++ compiler is automatic.

    extensibility. It should be possible to add new features by changing only particular codec and particular application, without modifications to framework and other codecs/apps. Multiple extensions can coexist together in the same codebase
    The idea of this approach is that almost any service provided by the compression library or framework, is already an extension to the tiny list of basic services. So all the services has the same rights, almost any service can be left undefined at will of the codec author. I optimized the framework for simplicity of use of services that i already need, in hope that the resulting framework will be equally convenient for future services that anyone may want to implement.

    This is why the framework includes only a few functions, with all specific services defined by the constants. This approach allows to add new services without need to modify the framework and language/codec bindings. +100 points to extension portability!

    Instead of providing lot of APIs each with only a few specific parameters, we provide only a few APIs each with a lot of generic parameters:
    typedef i64 CelsCallback (void* self, u32 service, CelsCallback* cb, void* ud, void* inbuf, i64 insize, void* outbuf, i64 outsize);

    i64 LzmaMain (void* self, u32 service, CelsCallback* cb, void* ud, void* inbuf, i64 insize, void* outbuf, i64 outsize);
    i64 LzmaCode (void* self, u32 service, CelsCallback* cb, void* ud, void* inbuf, i64 insize, void* outbuf, i64 outsize);

    i64 Cels (char* method, u32 service, CelsCallback* cb, void* ud, void* inbuf, i64 insize, void* outbuf, i64 outsize);
    i64 CallCels (void* method, u32 service, CelsCallback* cb, void* ud, void* inbuf, i64 insize, void* outbuf, i64 outsize);

    This parameters list is built to meet the needs of all existing FreeArc services. It is supposed that it will be enough for 99% of future services too. In a few remaining cases, one of pointers may point to structure storing the remaining params. Alternatively, remaining params may be returned via queries to the callback.

    All the API functions, including callbacks themself, are supplied with callback and userdata that should be passed as first parameter to this callback. This allows to build chains of nested callback calls, implementing arbitrarily complex service scenarios.

    simplicity. Especially simplicity of codec definition, as opposed to simplicity of applications using the API
    Simplicity of codec binding isn't ideal. First, CelsMain/CelsCode needs to dispatch services. Second, each service needs to check that unused parameters are set to 0, although it's a matter of agreement - if it doesn't perform checks, the service cannot be extended with non-0 values of these parameters later.

    Third is a feature-check. If application wants to know f.e. whether "lzma" supports buffer API, it should either try to call this service, or service should be able to answer itself to such questions. If services are provided by individual functions as in Squash, feature-check is trivial.

    All these problems can be solved by providing C++ class dispatching services to individual class methods (as already implemented in CLS).

    Finally, one should write code to implement parameter [un]parsing. While FreeArc already includes supplementary functions [un]parsing numbers and memory sizes, this may be further simplified by providing Squash-like parameter definition API with automatic [un]parsing. This will also ensure that unparsing really matches parsing.

    simplicity. Especially simplicity of codec definition, as opposed to simplicity of applications using the API
    Simplicity of application usage isn't ideal too, mainly because every framework call includes a lot of 0 parameters plus actual parameters, sometimes placed in awkward order. But this can be solved by providing API wrappers implementing individual services with just the parameters required:
    #define Compress(method,cb,ud) Cels(method,CELS_COMPRESS,cb,ud,0,0,0,0)
    #define CompressBuf(method,b1,s1,b2,s2) Cels(method,CELS_COMPRESS_BUF,0,0,b1,s1,b2,s2)

    This means extra work for the language binding author, so he have a choice what to implement - either quick-and-dirty uncomfortable API or convenient API with a bit more work.
    Last edited by Bulat Ziganshin; 19th February 2017 at 17:50.

  7. #5
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    Sorry about the delay.

    Squash is designed as an abstraction layer for compression libraries; i.e., exactly what Bulat is talking about. There is a core library (libsquash), and a set of plugins (IIRC there are a little over 30 so far). Looking at it from the compression codec's point of view (since I think that's what people care about around here):

    Each library has a different plugin, and each plugin can implement multiple codecs; i.e., the zlib plugin provides both "zlib" and "gzip" codecs. Codecs aren't necessarily unique; multiple plugins can implement the same codec, and codecs with the same name are considered compatible. For example, Squash also has plugins for miniz and zlib-ng, which implement the same codecs as the zlib plugin.

    Plugins are shared libraries (DLLs on Windows, dynlib on OS X, shared objects pretty much everywhere else), plus an ini file to describe some basics such as the library's license, as well as some basic information about each codec including

    * Name (required)
    * License(s)
    * Extension — to match file extensions to a codec (i.e., so we know the "gzip" plugin should be used for *.gz files.
    * MIME type — used to match a codec to a MIME type
    * Priority — used to determine which codec takes priority if there are mulitple plugins implementing the same codec.

    Basically, the ini file is used to grab information about the codec without actually loading the module. This is technically advantageous since loading libraries is relatively slow and bumps memory usage quite a bit if you have a lot of plugins. It's also critical for filtering codecs based on the license; for example, a proprietary application can't legally use the lzo plugin since liblzo is GPL.

    Once the plugin is actually loaded (dlopen/LoadLibrary), a function is executed which fills a struct with some callbacks, which is how the plugins connect Squash to the compression libraries. There are several possible APIs a codec can implement:

    * Streaming (zlib-style, where you put next_in/avail_in/next_out/avail_out in a struct)
    * All-in-one (provide the entire input at once)
    * Splicing (provide fread/fwrite-style callbacks)

    There is also a required function which returns the maximum amount of space required for *any* input of <= N bytes (think compressBound from zlib). Streaming and splicing require a single callback each, and all-in-one requires one function for compression and one for decompression.

    No matter which API you implement, Squash provides all three APIs to the caller. If the caller uses an API the plugin doesn't provide, Squash will implement it using an API you do provide. This means buffering if you only provide an all-in-one API, so it's best to implement one of the others.

    You can choose to implement multiple APIs, in which case the API Squash uses will depend on what the user calls.

    If you desire, you can also implement the Options API, which provides a relatively simple declaritive mechanism for people to pass options to your codec. They will automatically be parsed, and some limited checking can be performed (such as checking that ints are within a specified range). Currently we allow ints, sizes, enums, and strings, but more possibilities can be added if necessary.

    There is more information about this in Squash's documentation; see https://github.com/quixdb/squash/blo...s/internals.md and https://github.com/quixdb/squash/blo...lugin-guide.md.

    Plugins can, but don't have to be, distributed with Squash. There is a also separate project (squash-plugins-extra) for additional open-source codecs which aren't acceptable for distribution with squash due to issues with memory safety (e.g., pithy) and/or portability (e.g., lzsse). It's also possible for third parties to distribute plugins completely outside Squash's control.

    Codecs are really quite easy to implement; there is a bit of boilerplate, but generally it's possible to use one of the 30+ existing codecs as a guide (one of which probably has a similar API to your library) to get one done in a few minutes.



    As for the benefits of actually having a Squash plugin:

    Squash also provides lots of smaller utilities. For example, there is a stdio-style API for reading to/writing from files, and a file-to-file API for compressing from one FILE* to another. There are some nice optimizations in there, for example using memory mapped files for the file-to-file API for all-in-one codecs.

    Squash has two primary goals:

    * For consumers, provide a single API for compression, regardless of which codec is used.
    * For codecs, provide everything *except* the actual compression code. This includes
    * More user-friendly APIs
    * Bindings for other languages
    * A CLI
    * Packaging for distributions
    * A rather diabolical set of tests
    * Impartial benchmarking on many different systems

    One thing Bulat mentioned that is specifically outside the scope of Squash, at least for now, is archiving. I'm interested in doing something at some point, but I think it would be more appropriate to have it in a separate library (using Squash). Besides, this is basically a duplication of libarchive.

    For information on some of the C APIs Squash exposes, you might want to take a look at the User Guide.

    Codecs which are shipped with Squash are tested pretty heavily, including through AddressSanitizer and ThreadSanitizer (and I'm working on UBSanitizer) and on big endian. We have a lot of CI set up (for Linux, OS X, and Windows) testing lots of different compilers and versions. I also do what I can to fuzz codecs when I have time.

  8. Thanks:

    Bulat Ziganshin (19th February 2017)

  9. #6
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    Let's consider how we can implement FreeArc APIs with Cels. Note: i will show only significant function parameters, remaining call params should be set to 0. In almost all cases, integer slots are used to pass integer/boolean/enum parameters, while pointer slots are used to pass strings and pointers to buffers/structures/functions.

    // Parsing
    char method[CELS_MAX_METHOD_STRUCTURE_SIZE];
    i64 errcode_or_size = Cels("lzma:1g", CELS_PARSE, method);
    // errcode_or_size<0: error code
    // errcode_or_size>0: size of parsed structure stored in the method[] buffer

    // Querying integer method parameters
    i64 errcode_or_mem = CallCels(method, CELS_GET_COMPRESSION_MEMORY);

    // Querying string method parameters
    i64 errcode_or_outsize = CallCels(method, CELS_GET_ENCRYPTION_KEY, outbuf, outbuf_size);

    // Changing method parameters
    i64 errcode = CallCels(method, CELS_SET_COMPRESSION_MEMORY, mem);

    // Unparsing the modified method
    // FULL: all params for later use in compression
    // PURE: only decoding params for storing in archive and following use in decompression
    char method_str[CELS_MAX_METHOD_STRINGS_SIZE];
    i64 errcode = CallCels(method, CELS_UNPARSE, method_str, CELS_UNPARSE_FULL);
    i64 errcode = CallCels(method, CELS_UNPARSE, method_str, CELS_UNPARSE_PURE);
    // method_str=="lzma:1m"

    // Compression
    i64 errcode_or_outsize = CallCels(method, CELS_COMPRESS_BUF, inbuf, insize, outbuf, outbuf_size);
    i64 errcode = CallCels(method, CELS_COMPRESS, callback, self);

    // Destroying the parsed compression method structure (for the case it allocated an additional memory)
    i64 errcode = CallCels(method, CELS_FREE);


    Every CallCels invocation can be supplied with callback+userdata arguments. The callback may be used to provide additional services to the caller:
    // Provide non-standard error message as the string, providing user more information about the problem encountered
    i64 errcode = cb(ud, CELS_ERROR_MESSAGE, str);

    // Provide non-standard ZSTD-specifiс error code
    i64 errcode = cb(ud, CELS_ZSTD_ERROR, errcode);

    // Inform user about the [de]compression operation progress
    i64 errcode = cb(ud, CELS_PROGRESS, insize, outsize);


    Of course, callback is mandatory for callback-based [de]compression operations. It's called it with the following params:
    i64 errcode_or_size = cb(ud, CELS_READ, buf, size, 0/CELS_PARTIAL_OPERATION, num_stream);
    i64 errcode_or_size = cb(ud, CELS_WRITE, buf, size, 0/CELS_PARTIAL_OPERATION, num_stream);
    // errcode_or_size<0: error code
    // errcode_or_size>0: bytes processed
    By default entire buffer should be processed or error code is returned. CELS_PARTIAL_OPERATION allows to process only part of buffer and return the processed size. num_stream may be non-zero for codecs like BCJ2.


    Finally, there is an API to reuse the compression context (dictionary, hashtable...) once it allocated:
    // Alloc the compression context inside method[]
    i64 errcode = CallCels(method, CELS_PREPARE_COMPRESS_BUF);
    i64 errcode = CallCels(method, CELS_PREPARE_COMPRESS);

    // Compression operations use the compression context stored in method[] if available
    i64 errcode_or_outsize = CallCels(method, CELS_COMPRESS_BUF, inbuf, insize, outbuf, outbuf_size);
    i64 errcode = CallCels(method, CELS_COMPRESS, callback, self);

    // Destroy the compression context inside method[]
    i64 errcode = CallCels(method, CELS_FREE_COMPRESS_BUF);
    i64 errcode = CallCels(method, CELS_FREE_COMPRESS);

    // Destroying the method[] also destroys any builtin compression contexts
    i64 errcode = CallCels(method, CELS_FREE);

    Note that every operation (CELS_COMPRESS, CELS_DECOMPRESS, CELS_COMPRESS_BUF, CELS_DECOMPRESS_BUF) has its own context, independent of contexts of other operations. And since all contexts are stored inside method[], of course different method buffers have independent contexts.


    Any operation/callback may return an error code. In most cases it's a signal to stop operation as early as possible and return this error code back to the higher level. Notable exception is CELS_NOT_IMPLEMENTED code - if it returned for non-mandatory, informational call like CELS_PROGRESS, this code should be ignored and processing continued as for the code CELS_OK==0. Also, we can ignore error code of PREPARE operations.
    Last edited by Bulat Ziganshin; 19th February 2017 at 17:54.

  10. #7
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    And finally two examples of using the CELS framework. They should look familiar if you have previously seen the CLS examples.

    First is an application employing the framework to compress data from stdin to stdout with codec named "test":
    i64 __cdecl ReadWrite (void* self, int service, void* inbuf, i64 insize, void* outbuf, i64 outsize, CelsCallback* cb, void* ud)
    {
    if (service==CELS_READ) {
    return fread(outbuf, 1, outsize, stdin);
    }
    else if (service==CELS_WRITE) {
    return fwrite(inbuf, 1, insize, stdout);
    }
    else return CELS_NOT_IMPLEMENTED;
    }

    main()
    {
    Cels("test", CELS_COMPRESS, 0,0, 0,0, ReadWrite,0);
    }


    Second is an implementation of simplest codec - it ignores all extra codec parameters in CELS_PARSE and copies data intact in CELS_COMPRESS and CELS_DECOMPRESS:

    i64 __cdecl CelsCode (void* self, int service, void* inbuf, i64 insize, void* outbuf, i64 outsize, CelsCallback* cb, void* ud)
    {
    if (service==CELS_COMPRESS || service==CELS_DECOMPRESS) {
    char buf[4096];
    while (i64 len = cb(ud, CELS_READ, 0,0, buf,4096, 0,0))
    {
    if (len<0) return len; // Return errcode on error
    int result = cb(ud, CELS_WRITE, buf,len, 0,0, 0,0);
    if (result != len) return result<0? result : CELS_ERROR_WRITE;
    }
    return CELS_OK;
    }
    else return CELS_NOT_IMPLEMENTED;
    }

    i64 __cdecl CelsMain (void* self, int service, void* inbuf, i64 insize, void* outbuf, i64 outsize, CelsCallback* cb, void* ud)
    {
    if (service==CELS_PARSE) {
    (*(CelsFunction*)outbuf) = CelsCode;
    return sizeof(CelsFunction*);
    }
    else return CELS_NOT_IMPLEMENTED;
    }

    static i64 dummy = CelsRegister ("test", CelsMain, NULL);

  11. #8
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    Now let's look into implementation of asynchronous data exchange between multiple compression algorithms in the chain like rep+exe+delta+zstd

    The current API has only two synchronous "I/O" operations - READ(buf,size) and WRITE(buf,size). Internally, WRITE operation in algo1 sends FILLED_BUF event to the algo2, algo2 copies data at READ requests and when buffer exhausted, sends EMPTY_BUF event back, allowing algo1 to continue its work.

    We can extend this protocol to support BOTH old synchronous and new asynchronous operations, moreover allowing to combine sync reader with async writer and vice versa. Instead of events, we implement two async queues with the same names - FILLED_BUFS and EMPTY_BUFS. The FILLED_BUFS queue contains buffers filled by algo1 but don't yet consumed by algo2, while EMPTY_BUFS contains empty buffers ready to be filled by algo1. New async callbacks include RECEIVE_EMPTY_BUF/SEND_FILLED_BUF on writer side, and RECEIVE_FILLED_BUF/SEND_EMPTY_BUF on reader side, which directly deal with queues.

    Old WRITE callback is emulated by issuing SEND_FILLED_BUF followed by RECEIVE_EMPTY_BUF - since it's the only buffer in the queues, RECEIVE will wait until data are copied and get it back. Old READ(buf,size) callback is implemented by receiving buffers from FILLED_BUFS queue, copying data to the buf, and sending buffers to EMPTY_BUFS queue. Well, both operations will be implemented in old way except new method of information exchange - queues instead of events (essentially served as single-entry queues).

    This means that
    1) algorithms are no more run in lock and moreover, multiple instances of the single algorithm may be run in parallel as far as multiple buffers are available for input and output data
    2) extra memcpy is avoided as far as reader side calls RECEIVE_FILLED_BUF followed by direct use of buffer data instead of the old READ call

    The remaining questions
    1) who allocates/owns these buffers
    2) is reader thread allowed to modify buffer contents while it holds the buffer
    3) is writer thread allowed to partially fill a buffer
    4) what API the entire pipe provides, i.e. how it communicates with surrounding threads (on compression: read files + hash data / [encrypt+] write to archive; on decompression: read archive [+decrypt] / hash data + write to files)

    Let's start with simpler ones:


    Is writer thread allowed to partially fill a buffer?

    By default, no. Buffer owner belong to one of the following categories:
    1) buffer is segment of monolithic LZ dictionary or similar monolithic memory area. Entire buffer should be filled on each read to ensure contiguous data. Well, except if no more data are available.
    2) blockwise algo like bwt or dict. Can deal with smaller buffers, but compression ratio significantly degrades.
    3) streaming algo with blockwise implementation such as my delta. Smaller buffers insignificantly degrade the compression ratio.
    4) streaming algo, compression ratio doesn't depend on buffer size

    One more point is that even streaming algo may read data f.e. in 32-bit entities so odd-sized buffers should be processed in appropriate way. So, by default answer is no, but reader thread may provide hints allowing writer thread to do it. NB: this means that EMPTY_BUFS queue should contain {buf,size,hints} tuples.

    Note that this question doesn't make sense when buffers are owned by writer thread. The owner can define buffer bounds in any way it want.


    Is reader thread allowed to modify buffer contents while it holds the buffer?

    By default, no. Again, think about LZ dictionary content (owned by the writer thread). But the writer thread may release such permission, and actually it's ok for most freearc algos. So, the FILLED_BUFS queue should also contain {buf,size,hints} tuples. And again, this restriction applies only to buffers owned by writer thread - reader thread can do anything with buffers it owns.

  12. Thanks:

    RamiroCruzo (2nd April 2017)

  13. #9
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    Who allocates/owns the buffers?

    At the end of day, both reader and writer may propose to use their own buffers for data exchange. But async queues between each pair can contain only buffers from one side. So, framework should decide:
    1) if only reader or only writer provides its buffers, decision is obvious
    2) if none do, framework should allocate buffers themself
    3) if both do, framework should choose one side and memcpy data into local buffers on the other side (essentially, it's the implementation of READ+WRITE synchronous calls on both sides that described above; in algorithm i described, framework always chooses writer buffers to be sent over queues and memcpy performed in reader thread)

    Now let's see how reader and writer can inform framework that they provide some buffers for exchange. If writer calls SEND_FILLED_BUF without prior RECEIVE_EMPTY_BUF - this means writer provides its own buffer and framework decides that writer will continue to do the same (i.e. never try to receive more buffers than it had sent, so all received buffers will be writer own ones that were previously sent to reader).

    If writer calls RECEIVE_EMPTY_BUF first, framework decides that writer will always rely on borrowed buffers and use SEND_FILLED_BUF only to return buffers it borrowed - exactly in the original order (well, we may add another hint allowing reader/writer thread to return borrowed buffers in arbitrary order - it will be useful for m/t processing).

    The same applies to reader thread - it calls either RECEIVE_FILLED_BUF or SEND_EMPTY_BUF first, declaring its intent to use borrowed buffers or provide its own ones.



    So, framework should start all threads in the pipe and wait until both sides of each thread boundary declare their intents. Then, if both sides prefer to borrow, framework should alloc these buffers itself. Let's choose optimal buffer size and count:
    1) smaller buffers allow to process data in cpu cache, w/o swapping to the RAM. Cost of swapping, i.e. memory read/write is about 0.1-0.2 cpu cycles per byte
    2) larger buffers allow to switch threads less frequently. AFAIK, each thread switch costs ~1000 cpu cycles, so with 256 KB buffers this means overhead of 1/256 cpu cycles per byte
    3) and of course, smaller buffers means reduced memory overhead

    It seems that small buffers that can be kept inside cpu cacvhes, are more preferable. Even if data will be swapped out, overhead of thread switching with pretty small 256 KB buffer is negligible. Also note that low-end CPUs may have LLC cache of only 1-2 MB, so even smaller 64 KB buffers may become optimal. OTOH, disk I/O speed is best at 1 MB chunks, pretty close to best at 256 KB chunks, but slower at 64 KB chunks.

    Now let's look into buffer count. With 1 buffer, reader and writer will work in lock, waiting for each other. With 2 buffers, they can work in parallel. More buffers allow to smooth speed variations. Taking all that into account, i will choose 3 buffers 256 KB each.



    Note that if there is a choice, borrowing buffers is preferable to providing ones. If both sides choose to provide buffers, it means 2 buffers are allocated instead of 1, and extra memcpy is required. OTOH, if both sides are ready to borrow buffers, this only means that framework should allocate the buffers, but efficiency of their use will be kept at maximum.

    Overall, thread should prefer to borrow buffers if it can work with buffers of any size/count, i.e. it's essentially a single-threaded streaming algorithm that sequentially reads input data in bytes/words and/or writes output data in bytes/words. Note that each algorithm makes two independent decisions - whether to borrow/provide input buffers and the same for output buffers .

    So, if a thread prefer to borrow buffers, this means that it's ready to work with buffers of any size that justifies our idea of allocating small 256 KB buffers when both sides are borrowing. If a thread needs larger buffers or require more buffers for m/t processing, it has better alloc buffers itself, since opposite thread may have its own idea of optimal buffer configuration and provide its own buffers.



    Finally, some simple algorithms including bcj and ciphers, may go further by providing buffers they just borrowed on the other end! I.e. either
    Code:
    buf = BORROW_INBUF()
    process(buf)
    PROVIDE_OUTBUF(buf)
    or
    Code:
    buf = BORROW_OUTBUF()
    PROVIDE_INBUF(buf)
    (borrowing outbufs is preferable since they are never write-protected). This further reduces amount of memory allocated and more important - decreases cpu cache pressure. OTOH, this means that the fixed amount of buffers will be shared among 3 threads instead of 2, limiting their ability of parallel execution. If these buffers are provided by the framework, it can just allocate more buffers, but the algorithm can't know beforehand whether it will take place.


    And the final note. When one side borrows buffers and other side provides large buffers, framework can chop the large buffers into small (256KB) chunks. But it seems that this will not make any difference except for pipes that include threads sharing borrowed-at-other-end buffers.

  14. #10
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    What API the entire pipe provides, i.e. how it communicates with surrounding threads?

    The low-level answer is straightforward. Internal communication between threads is handled by framework, and only two parts remain - reading in the first algo and writing in the last algo. So, the pipe sends remaining callbacks to the caller:
    MultiCompress("rep+lzma", this, [] (void* self /*this*/, int service...) {
    switch (service) {
    case RECEIVE_FILLED_BUF: case SEND_EMPTY_BUF: // reading from the first algo...
    case RECEIVE_EMPTY_BUF: case SEND_FILLED_BUF: // writing by the last algo...
    }
    });
    Note that when MultiCompress function returns, entire compression is finished.



    Another, high-level approach is to provide the same support for external pipe boundaries as for internal ones. This means that each external boundary also get FILLED_BUFS and EMPTY_BUFS async queues. Reads by the first algo and writes from the last algo goes into these queues in usual way, i.e. using SEND/RECEIVE calls. Moreover, the external threads that provide data to the first algo and receives data from the last algo, also employs the SEND/RECEIVE API.
    pipe = StartMultiCompress("rep+lzma");
    MultiCompressService (pipe, SEND_FILLED_BUF, source_data...);
    MultiCompressService (pipe, RECEIVE_FILLED_BUF, compressed_data...);
    ...
    Essentially, last example is similar to the codec implementation, with "cb(ud, service...)" replaced by "MultiCompressService (pipe, service...)" where MultiCompressService is a global function provided by the piping framework, and the pipe variable is a handle of pipe started by StartMultiCompress. Once the pipe was started, you will probably pass its handle into two separate threads - one reading input data and sending them into the pipe, and another one receiving compressed data from the pipe and using them as you wish.

    So, while internal bounds transfer data between two compression algos, external bounds transfer data between application reading thread and first algo, as well as between last algo and application writing thread. Similar to internal bounds, on each external bound the framework may have 0, 1 or 2 sides trying to provide their own buffers for exchange and should handle all combinations in the same way.

    In ideal application, both reader and writer threads use the buffer-borrowing API, so compressors can choose size and amount of buffers that they prefer. The only exception is when files can be mmapped - in this case reader and/or writer should provide mmaped buffers to their correspondent threads (rather than borrowing buffers from these threads), saving one more memcpy. Yes, this means that pipe with single compression algorithm can compress data directly from one mmaped buffer to another one, without any memory allocation!!!

  15. Thanks:

    RamiroCruzo (2nd April 2017)

  16. #11
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    i've finished development of first CELS version and included it in my compression library, so it will be available in all my future releases of freearc, fa'next, unarc/dll/sfx and fazip. now i work on the documentation and example of CELS plugin

    analyzing Squash vs CELS, i made a conclusion that they are hardly competitors. Squash was developed to use in general-puprpose applications. It functionality is "wide" rather than "deep". I.e. it require only a few basic services implemented by codec, but OTOH it provides a lot of utility services provided on top of it. Just f.e. application can printf to logfile, with automatic contents compression and even iconv-based conversion. So i have an idea - probably many people look at this forum just to find good compression library to be built inside their apps. And unlike people looking for zstd or zpaq, they don't even know keyword to search for. So pinning topic about Squash will greatly help them.

    On the opposite side, CELS provides almost no own services aside of must-have features, i.e. loading, registering and dispatching codecs. But it allows codecs to implement a lot of fine-tuning functionality such as LimitCompressionMem or TailorToSmallInputSize. Clearly, it was made to make full power of FreeArc internal APIs available for external codecs. Using CELS, FreeArc can control memory/cpu usage, tailor block sizes, reuse allocated memory and use other very specific functions that may be interesting only for specialzied compression applications.

    Additionally, CELS was simplified as much as possible so it should be easy to quickly add minimal CELS support to your new codec.

    Overall, it looks that Squash is a good place for well-developed, production-quality codecs that may be used in general-purpose apps.

    OTOH, CELS is good for
    1) new codecs - it should be easy to add CELS support and once it done, you can try your codec using all the features provided by freearc and other my programs, i.e. gathering files together into solid blocks, multi-threading with 4x4 and so on. this is a huge contrast to Squash that doesn't provide only very simple "standard app" for codec testing
    2) experimental/geeky codecs. anything providing 1% better compression for 10x slowdown, or using gigabytes of RAM, will be more ineteresting for archiver users rather for users of general-purpose apps. CELS provides codecs a broad set of APIs that can be used to fine-tune algorithm for each specific setting

    In short, if you write an app - use Squash. When you will really need CELS, you will know it without me. When you write new or sophisticated codec - use CELS and you will get a good testing platfrom and interested users. When your codec is production-ready - use Squash to deliver to much broader audience.

  17. #12
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,366
    Thanks
    212
    Thanked 1,018 Times in 540 Posts
    What do you think about (de)coding progress estimation?

    In 7z, its also a part of standard codec API, but it seems that even original 7z codecs (and framework) misuse it.
    Atm, it looks like 7z reports % of input of the first filter in coding graph (to total unpacked size) on encoding,
    and % of input of the leaf filters (to total packed size stored in archive index) on decoding.

    So when we use complex configs with rep/split/x64flt/etc, it usually leads to encoding progress reaching 99%
    in a few seconds, then staying there for a long time.

    In the end, I think Igor's design idea was right - when some filters use large blocks, it becomes impossible
    to determine their progress just by looking at their input/output, so explicit API is necessary.

    But what kind of interface it needs (and how framework should compute progress estimation for a whole filter graph)
    is still very uncertain to me.
    For example, with current 7z api, we can report "virtual input size" in the form of processed_input+current_block_size*block_progress,
    but can't it be something simpler?

    Also, even if we have just a single streamable codec, its processing speed doesn't really remain the same in many cases,
    and I think that archiver progress ideally should represent the remaining processing time, rather than the size of data.
    So we might have to add some analysis module for non-linear extrapolation of remaining processing time.

  18. #13
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    1. when each method periodically reports insize/outsize of next chunk (even when this chunk isn't yet written to output), it's possible to compute estimation of final output size. look a section with uiReadData
    2. knowing size of input data and estimated size of output data, i compute "total percents processed" as mix of 10% of input percents processed plus 90% of output percents processed. that's reasonable for freearc usual compression chain where the last lzma algo takes 90% of compression time

Similar Threads

  1. Compression library advice
    By bloom256 in forum Data Compression
    Replies: 24
    Last Post: 23rd November 2015, 23:22
  2. Assessing compression library reliability
    By dbbd in forum Data Compression
    Replies: 5
    Last Post: 28th May 2015, 17:03
  3. Open Source Streaming API for Compression
    By harry in forum Data Compression
    Replies: 7
    Last Post: 30th September 2014, 08:28
  4. Standard for compression libraries API
    By Bulat Ziganshin in forum Data Compression
    Replies: 47
    Last Post: 30th March 2009, 07:10
  5. MM compression library
    By Bulat Ziganshin in forum Forum Archive
    Replies: 29
    Last Post: 12th September 2007, 16:40

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •