Results 1 to 3 of 3

Thread: Please explain zpaq architecture for one how has never read the code

  1. #1
    Member
    Join Date
    Jan 2012
    Location
    Pluto
    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Question Please explain zpaq architecture for one how has never read the code

    Well, it would be nice if anybody could explain it to me.
    Ill start how I understood how it works, please correct me if I'm wrong.

    ZPAQ cares for filenames checksums, etc.
    It uses Context Mixing. Each Model that are to be mixed are descriped by VM-Bytecode. There are three bytecodes build in, but they are intercangeable. It usually makes no sense to create a model that predicts that the output will be PI, but if you do and the output acutally should be pi the compression will be amazing. All these Models are "symetric" due to their nature, the same code will be used for compression and decompression.

    Then there are pre and post processors. Each of them are different, but ZPAQ asserts that post(pre(data))==data.
    These processors can manipulate files such as BWT and the BWT^-1 or encoding in flac and decoding, it's more likely that compression will be inversed in the preprocessing step and later compressed again since many file formats contain inefficent compression. So e.g. a png file is processed to a raw bitmap, because there are better predictors than the one png uses (it it a predictor?) witch is later reversed.

    Can these Processors be cained? Are they written in VM bytecode and included in the archive? (I guess not, since I saw C-Code; if not: how are they distributed, executed and how portable are they, and are they sandboxed?) What are configurations? Is there a mechanism that selects fitting processors? How to write processors? And CM with just one static model is just Huffman coding, isn't it?
    Last edited by FritzStein; 14th January 2012 at 19:18.

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,348
    Thanks
    212
    Thanked 1,012 Times in 537 Posts
    > Can these Processors be chained?

    Afaik its not supported by the format, but it should be possible
    to write processors with multiple layers.

    > Are they written in VM bytecode and
    > included in the archive? (I guess not, since I saw C-Code;

    Only the post-processors for decoding are added to archive as bytecode.
    Afaik, preprocessors have to be implemented as external tools.

    > if not: how are they distributed, executed and how portable are they,
    > and are they sandboxed?)

    zpaq bytecodes should be safe (I didn't check it for exploits though).

    And the main idea is to build a forward-compatible format, so that
    an old decoder would be able to decode any future extensions.
    Thus encoder implementations are a responsibility of their developers.

    > What are configurations?

    Config-files describe the decoding algorithm - the model and
    the postprocessor.

    > Is there a mechanism that selects fitting processors?

    Afaik, not atm.

    > How to write processors?

    RTFM and write?

    Although afaik "processors" are not the main point of zpaq format -
    its supposed to be all about context models and preprocessors are not
    supposed to be anything complex.

    Also programming in zpaq language is basically assembly programming,
    so it kinda makes sense to forget about zpaq if you just want to write
    a new compressor.
    Zpaq would be relevant only at the point where you have a stable working
    codec and (for some weird reason) want zpaq decoder to support it.

    > And CM with just one static model is just Huffman coding, isn't it?

    zpaq1 always uses arithmetic coding.

  3. #3
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    http://mattmahoney.net/dc/dce.html#Section_437
    has a section that describes ZPAQ. It might help to read the earlier sections for background material on context mixing, arithmetic coding, etc. The book describes an earlier version where I optimized ZPAQL code in config files by translating to C++ and recompiling. The newest version translates directly to x86 so it is all transparent to the user (and just as fast). It will work on non x86 processors but the ZPAQL is interpreted, so it won't be as fast.

    ZPAQL is used in two places, to compute contexts, and sometimes for postprocessing. If you write a postprocessor, then you also need to write a preprocessor (in any language) which will be called by the zpaq program. It will check at compress time that the postprocessor will restore the original file correctly (unless you use -q to skip the test).

    ZPAQL is a sandboxed virtual machine designed to be fast and compact, not easy to program in. Most users will just use the built in models in zpaq (-m1 through -m4). For the ones that use postprocessors (-m1 and -m2), the preprocessor is build in so you don't have to worry about it. Config files and external preprocessors are only for compression experts to develop new algorithms.

    ZPAQL bytecode should be safe. There are no instructions to access disk or OS or memory outside the pre-defined arrays. It is open source to encourage people to look for exploits. Some known problems with archivers in general is that you can create archives that will create or overwrite arbitrary files when extracted. You can also create a small archive that will fill up the disk. But these problems are not unique to ZPAQ.

    Each ZPAQL virtual processor is single threaded. But the ZPAQ archive format allows multiple blocks to be compressed or decompressed independently in parallel. A block can be one or more files or a part of a file. But there is a tradeoff because larger blocks usually compress better.

Similar Threads

  1. Compiler / OS / Architecture / ... detection macros
    By m^2 in forum The Off-Topic Lounge
    Replies: 1
    Last Post: 12th October 2011, 01:30
  2. Replies: 3
    Last Post: 30th July 2011, 15:48
  3. Huffman code generator
    By Shelwien in forum Data Compression
    Replies: 2
    Last Post: 24th May 2011, 03:50
  4. Mark Forums Read
    By Surfer in forum The Off-Topic Lounge
    Replies: 6
    Last Post: 12th May 2010, 19:07
  5. Code Optimisation
    By Cyan in forum Data Compression
    Replies: 18
    Last Post: 18th January 2010, 01:48

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •