Results 1 to 5 of 5

Thread: PAQ8 Byte Prediction Idea

  1. #1
    Member
    Join Date
    Jan 2020
    Location
    China
    Posts
    2
    Thanks
    1
    Thanked 0 Times in 0 Posts

    PAQ8 Byte Prediction Idea

    As a new beginner of the compressiong algorithm, I am interested in studying the high compress ratio algorithm such as PAQ8 and cmix. I find it uses a lot of models to predict each bit in PAQ8, which makes it run slow. After reviewing the code, in the most models, they put the byte contexts (update each byte) in different maps to compute the prediction and update the info in the map by each bit. So I have an idea: if we use the byte contexts to predict next byte directly, it can increase the speed a lot. Different from PPM, we use context models and mixer to increase the accuracy of prediction to differernt types of data.

    I am not sure whether there already exists any idea or algorithm about such kinds of byte Prediction (Combing PAQ8 models and Byte Prediction)? And I notice that it may decrease the compression rate but I do not know about how much percent it will decrease?

    Thanks

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,688
    Thanks
    264
    Thanked 1,180 Times in 651 Posts
    > I find it uses a lot of models to predict each bit in PAQ8, which makes it run slow.

    Comparing to NNCP in single-threaded mode, even cmix is quite fast.

    > if we use the byte contexts to predict next byte directly, it can increase the speed a lot.

    Maybe you mean this: http://mattmahoney.net/dc/dce.html#Section_435

    As to increasing the speed, its only possible by disabling most submodels
    while matchmodel gives good results.
    Some paq derivatives actually do that (eg. nzcc), but its a risky optimization -
    for example, imagine what happens if we'd skip a few bytes in a jpeg file.

    > Different from PPM, we use context models and mixer to increase the accuracy
    > of prediction to differernt types of data.

    Btw, its possible to combine PPM with paq: https://encode.su/threads/2515-mod_ppmd

    > And I notice that it may decrease the compression rate but
    > I do not know about how much percent it will decrease?

    That depends on the specific implementation details.
    We have to identify submodels which can't be skipped,
    matchmodel threshold for enabling/disabling other submodels, etc.

    Also the same effect can be reached by external preprocessing
    with existing tools (precomp,srep) - if we'd remove long dups
    from the data, there'd be also no speed improvement
    from special handling of these dups in paq8.

    In any case, paq8 and cmix are bad targets for speed optimizations
    at the cost of compression.

  3. #3
    Member
    Join Date
    Jan 2020
    Location
    China
    Posts
    2
    Thanks
    1
    Thanked 0 Times in 0 Posts
    ​Thanks for your advice. It is certianly a risky optimization. When I want to use the FPGA to increase the speed of the PAQ8, I find there are too many memory access in Context Map. For example, when I compress a 8KB data by 10 models (suppose 100 contexts in total using context map), I have to call the context map 100 * 8 * 8KB = 6553600 times. The cost of the memory access is very high due to reading hash table in Context Map. That is why I consider using the byte prediction.

    So is there any good methods to reduce the memory access cost in PAQ8? Or is there any other maps which can predict 8 bits by only one memory access?
    Last edited by Howard Jiang; 10th February 2020 at 09:31.

  4. #4
    Member
    Join Date
    Feb 2020
    Location
    China
    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts
    To be commercial availability, implementation cmix on FPGA is a good idea. As I know, it is easy to increase the computing power in hardware. However, reducing the number of DRAM/SRAM memory access is critical, which determines the final compression latency. I have an idea. Encoding a byte with only once memory accessing, and the probability of each bit within the byte will be calculated by the memory querying result. Does anyone have a more specific method or recommended models?

  5. #5
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,688
    Thanks
    264
    Thanked 1,180 Times in 651 Posts
    > For example, when I compress a 8KB data by 10 models (suppose 100 contexts in total using context map),
    > I have to call the context map 100 * 8 * 8KB = 6553600 times.
    > The cost of the memory access is very high due to reading hash table in Context Map.

    Yes, but you have to understand that it's the main point of CM (context mixing) compression algorithms.

    PAQ compression is good specifically because it has 100s different submodels,
    which produce their own predictions.

    Otherwise PAQ is very simple (comparing to advanced LZ with parsing optimization), which is a good point.

    And hardware might be able to provide its own solutions...
    At least, it would be possible to execute submodels in parallel
    (which is hard on PC because of high cost of thread synchronization),
    and hashtable can be potentially replaced with some kind of associative memory.
    Or at least individual blocks of SRAM per submodel.

    Alternatively, you can consider this kind of approach:
    https://encode.su/threads/541-Simple...xt-mixing-demo
    ie dynamic rebuilding of probability distributions.
    I suppose it might work with hardware and small blocks.

    > That is why I consider using the byte prediction.

    You can see here: http://mattmahoney.net/dc/text.htm
    that "durilca" (preprocessors+ppmonstr; ppmonstr is ppmd+SSE)
    has better results than some of PAQs, despite being
    a purely bytewise compressor.

    Also there're speed-optimized paq versions (lpaq,paq9 etc).

    So its not impossible.
    But unfortunately atm its up to your own research,
    if you want to reach paq8 compression level with less memory load.

    > So is there any good methods to reduce the memory access cost in PAQ8?

    1. To avoid allocating 1Gb of RAM to compress 8kb of data,
    its possible to replace hashtables with trees.
    It would increase the number of memory accesses though.

    2. The most common solution is simply removing submodels with low gain.
    Also enabling some submodels only for corresponding data types.

    3. Instead of hashtable, probabilities can be dynamically recomputed
    from rescan of actual data.

    > Or is there any other maps which can predict 8 bits by only one memory access?

    You have to understand that for 8 bits there're 256 different values,
    and we need a probability for each of them.

    So yes, its possible to make a matchmodel, which would only predict one specific byte value,
    and store only one probability per context.
    But its impossible to achieve good compression ratio with only that kind of models.
    Even PPM would have up to 256 counters per context, even if this counter table
    is fetched once per byte, rather than once per bit.

    Well, there're asymmetric methods (like modern LZs) which significantly reduce
    memory usage at decoding side.
    But instead, encoding side becomes much more complex, slower, and requires even
    more memory accesses.

  6. Thanks:

    Howard Jiang (11th February 2020)

Similar Threads

  1. experiments with a new idea for MTF with prediction
    By michael maniscalco in forum Data Compression
    Replies: 0
    Last Post: 25th April 2019, 08:31
  2. combining prediction with entropy coder
    By smjohn1 in forum Data Compression
    Replies: 8
    Last Post: 19th November 2018, 00:38
  3. Replies: 25
    Last Post: 31st August 2018, 13:38
  4. Adaptive Lossless Prediction impl in C++
    By mo_ in forum Data Compression
    Replies: 1
    Last Post: 26th January 2017, 06:49
  5. CPU Branch Prediction as an entropy model
    By Shelwien in forum Data Compression
    Replies: 11
    Last Post: 17th December 2016, 21:27

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •