Results 1 to 17 of 17

Thread: Understanding PAQ

  1. #1
    Member
    Join Date
    Aug 2017
    Location
    United States
    Posts
    33
    Thanks
    19
    Thanked 23 Times in 16 Posts

    Understanding PAQ

    Here is a program I wrote, based on my refactor of paq8px. It takes one parameter from the command line, which is just a file to process. Like paq8px, it reads the file bit-by-bit and passes those bits to a context model. Unlike paq8px, it doesn't create an output file. It simply records the input bits and the predictions, and for each bit that isn't perfectly predicted, it outputs the bit position, the bit, and the prediction. It also outputs the average prediction error across all bits in the file.

    Code:
    #include "Mixer.hpp"
    #include "MixerFactory.hpp"
    #include "ModelStats.hpp"
    #include "file/FileDisk.hpp"
    #include "file/FileName.hpp"
    #include "file/fileUtils2.hpp"
    #include "model/ContextModel.hpp"
    #include <cstdint>
    #include "Models.hpp"
    
    auto main(int argc, char *argv[]) -> int {
      auto shared = Shared::getInstance();
      shared->chosenSimd = SIMD_AVX2;
      shared->setLevel(9);
      FileName input;
      input += argv[1];
      FileDisk f;
      f.open(input.c_str(), true);
      uint64_t fSize = getFileSize(input.c_str());
      auto *modelStats = new ModelStats();
      auto *models = new Models(modelStats);
      ContextModel contextModel(modelStats, *models);
      auto updateBroadcaster = UpdateBroadcaster::getInstance();
      auto programChecker = ProgramChecker::getInstance();
      shared->reset();
      shared->buf.setSize(shared->mem * 8);
      int c = 0;
      uint8_t y = 0;
      auto results = static_cast<uint16_t *>(malloc(8 * fSize * sizeof(uint16_t)));
      auto ys = static_cast<uint8_t *>(malloc(8 * fSize * sizeof(uint8_t)));
      uint64_t position = 0;
      for( int j = 0; j < fSize; ++j ) {
        c = f.getchar();
        for( int i = 7; i >= 0; --i ) {
         auto p = contextModel.p();
         results[position] = p;
         y = (c >> i) & 1U;
          shared->y = y;
          shared->update();
          ys[position] = y;
          ++position;
          updateBroadcaster->broadcastUpdate();
        }
      }
    
      uint64_t sum = 0;
      for( uint64_t i = 0; i < position; ++i ) {
        y = ys[i];
        uint16_t target = y == 0 ? 0 : 4095;
        if( target != results[i] ) {
          printf("%llu, %d, %d\n", i, target, results[i]);
        }
        sum += abs(target - results[i]);
      }
      printf("(%llu - %llu) %f\n", 0ULL, position, double(sum) / double(position));
      programChecker->print();
      return 1;
    }
    ​
    Notice that the prediction is made before the model is updated with the next bit, such that the prediction of the first bit is made after having seen 0 bits, the prediction of the second bit is made after having seen 1 bit, and so on. Is this correct?

    Here are the first 10 lines of output from when compressing enwik8:

    Code:
    0, 0, 2047
    1, 0, 2047
    2, 4095, 2047
    3, 4095, 2047
    4, 4095, 2047
    5, 4095, 2043
    6, 0, 2043
    7, 0, 2043
    8, 0, 2043
    9, 4095, 2047
    10, 4095, 2047
    and here are the last few lines:
    Code:
    799999975, 0, 3753
    799999979, 0, 11
    799999980, 0, 2208
    799999981, 0, 438
    799999987, 4095, 3199
    799999988, 0, 1
    799999989, 0, 1546
    799999991, 4095, 3909
    799999993, 0, 5
    799999996, 0, 6
    (0 - 800000000) 295.095109
    Time 26567.74 sec, used 3814 MB (3999353079 bytes) of memory
    Notice that some bit positions aren't printed (e.g. 799999976, 799999977, etc) -- this means that they were perfectly predicted.

    I have more questions but this post is long enough as it is -- I'll save those for later posts in this thread.

  2. #2
    Member
    Join Date
    Apr 2009
    Location
    The Netherlands
    Posts
    83
    Thanks
    6
    Thanked 18 Times in 11 Posts
    Maybe you are already past this stage but for me it helped to write a floating point version of the paq mixer, stretch,squash, etc. to learn to understand how it works. The integer version with bit hacks is less readable and more difficult to understand.

    Also: even if you are able to predict something 'perfect' you have to led the decoder know it was perfect. Its just a close to perfect prediction for your range encoder, taking a fraction of a bit. But you have to do it in order to be able to decode it later, because your decoder doesn't know for sure if its was perfect or 'very very wrong' in a human way of to describe entropy prediction based encoding.

    So for encoding:
    1. Make prediction
    2. Feed the range encoder the prediction (p) and the actual bit (y).
    3. Update the models with the actual bit.
    4. Go to 1.

    For decoding:
    1. Make the prediction
    2. Read the actual bit from the range encoder with the prediction (p) as parameter.
    3. Update models with the actual bit.
    4. Go to 1.

  3. #3
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,977
    Thanks
    296
    Thanked 1,305 Times in 741 Posts
    @Kaw: its not about understanding how logistic mixing works etc - we do know that already.
    Its about paq8px refactoring without compression loss.
    paq8 has many obscure tricks and dependencies, so there's a big step between "order-N bitwise CM with logistic mixing" and "paq8" -
    Matt himself tried making it more modular (paq9,zpaq) but got worse compression ratio.

    The start of this discussion is actually here: https://encode.su/threads/342-paq8px...ll=1#post62902

    @Eppie: I don't think its a good idea to skip positions, it just complicates the log structure for no good reason.
    Also, would it be possible to log predictions of individual submodels?..

  4. #4
    Member
    Join Date
    Apr 2009
    Location
    The Netherlands
    Posts
    83
    Thanks
    6
    Thanked 18 Times in 11 Posts
    Oh okay..., maybe that's something to mention in the topic start?
    Now it reads like someone tried to understand Paq8 (on a basic level) by rewriting it for personal usage. At least to me... Any, thanks for info.

  5. #5
    Member
    Join Date
    Aug 2017
    Location
    United States
    Posts
    33
    Thanks
    19
    Thanked 23 Times in 16 Posts
    @Kaw: Sorry I wasn't clear. The driving factor for me understanding PAQ is to refactor it to make it more modular, and thus more easily understood and easier to contribute to, as @Shelwien stated. I started this new thread because I didn't want to clutter the paq8px thread even more. I do have some questions about the architecture of the mixer, but I will save those for another post.

    @Shelwien: I can change the output to include every position; that's easy enough. When you say log the prediction of individual submodels, are you talking about the MatchModel, NormalModel, etc? If so, yes, that's rather easy, and here's the last 30 predictions from just the MatchModel for enwik8:

    Code:
    799999970: 1, 3848
    799999971: 0, 816
    799999972: 1, 1095
    799999973: 0, 2688
    799999974: 0, 258
    799999975: 0, 2845
    799999976: 0, 0
    799999977: 1, 4072
    799999978: 1, 4093
    799999979: 0, 101
    799999980: 0, 646
    799999981: 0, 607
    799999982: 0, 1
    799999983: 1, 4095
    799999984: 0, 0
    799999985: 1, 4089
    799999986: 1, 4093
    799999987: 1, 3199
    799999988: 0, 11
    799999989: 0, 460
    799999990: 1, 4057
    799999991: 1, 3734
    799999992: 0, 0
    799999993: 0, 87
    799999994: 1, 4092
    799999995: 0, 19
    799999996: 0, 196
    799999997: 0, 11
    799999998: 0, 0
    799999999: 0, 0
    (0 - 800000000) 419.493850
    Time 483.81 sec, used 419 MB (439973552 bytes) of memory

  6. #6
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,977
    Thanks
    296
    Thanked 1,305 Times in 741 Posts
    I got repository code to compile on windows with some patches.
    Surprisingly clang was the hardest to compile and in the end clang version doesn't work (crashes).
    There was some really weird stuff like #include <...> within a function.
    Also somehow now it doesn't compile without AVX2 and C++17 - I think its a bad development, since original worked with C++98 and no vectors at all.
    (Also x86 is not the only cpu).

    But the main problem is this:
    Code:
    book.paq8px183fix1 -7 184531
    book.paq8px184 -7     185858
    Attached Files Attached Files

  7. #7
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,977
    Thanks
    296
    Thanked 1,305 Times in 741 Posts
    Ported a version of html markup utility for this logger.
    Redder letters have longer codes.
    Click image for larger version. 

Name:	book1a.png 
Views:	50 
Size:	254.0 KB 
ID:	7309
    Attached Files Attached Files

  8. #8
    Member
    Join Date
    Aug 2017
    Location
    United States
    Posts
    33
    Thanks
    19
    Thanked 23 Times in 16 Posts
    I live in GMT-05:00, so it is currently quite late for me. I will begin to address your concerns now, and hopefully have a pull request ready at some point tomorrow.

    1. Thank you for the patch. I will incorporate it.
    2. The AVX2 requirement is an oversight on my part. Thank you for pointing it out. I will correct that.
    3. Modern C++ has a number of nice features (expanded applicability of constexpr, auto, template deduction, etc) and is well-supported among recent compilers. I don't see a compelling reason not to use it.
    4. I will look into the compression regression you noted, and see if it affects any other files as well.
    5. Your log2htm program is quite neat. I will play around with it.
    Last edited by Eppie; 28th January 2020 at 09:09.

  9. #9
    Member
    Join Date
    Aug 2017
    Location
    United States
    Posts
    33
    Thanks
    19
    Thanked 23 Times in 16 Posts
    Good news and bad news:
    Good news, I used git bisect to identify the commit that introduced that compression regression. Bad news, the commit is fairly large, so it's going to take me a while longer to identify the exact cause and fix it. In the meantime, I've pushed a commit to my fork which should address the AVX2 requirement and the windows compilation. Would you mind verifying those fixes for me @Shelwien?

  10. Thanks (2):

    schnaader (28th January 2020),Shelwien (28th January 2020)

  11. #10
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,977
    Thanks
    296
    Thanked 1,305 Times in 741 Posts
    For some reason you restored the weird #include inside of a function:

    auto main(int argc, char **argv) -> int {
    #ifdef WINDOWS
    // On Windows, argv is encoded in the effective codepage, therefore unsuitable for acquiring command line arguments (file names
    #ifdef WINDOWS
    #include "shellapi.h"
    #pragma comment(lib,"shell32.lib")
    #endif

    // in our case) not representable in that particular codepage.
    // -> We will recreate argv as UTF8 (like in Linux)
    uint32_t oldcp = GetConsoleOutputCP();
    SetConsoleOutputCP(CP_UTF8);


    Also I found the reason for clang crashes - clang doesn't save registers when it uses MS version of cpuid intrinsic - have to use the gcc version.
    Attached Files Attached Files

  12. #11
    Member
    Join Date
    Aug 2017
    Location
    United States
    Posts
    33
    Thanks
    19
    Thanked 23 Times in 16 Posts
    @Shelwien: That was an oversight on my part. It should be fixed now. I also applied your fix for the clang crash. Also, I identified a regression in compression of BOOK1 and pushed a fix for it to the master branch of my fork. The compression is improved, but not quite equal to what was obtained by v183fix1. I will continue to search for the cause of the remaining difference.

    I will follow up with a question about the mixer architecture in paq later today.

  13. Thanks:

    Shelwien (2nd February 2020)

  14. #12
    Member
    Join Date
    Aug 2017
    Location
    United States
    Posts
    33
    Thanks
    19
    Thanked 23 Times in 16 Posts
    ​I added some logging so I can try and understand things better. Here is a snippet from the output:
    Code:
    Created Mixer with n = 1408, m = 72368, s = 28
    Created SIMDMixer with n = 1406, m = 72368, s = 28
    Created Mixer with n = 32, m = 1, s = 1
    Created SIMDMixer with n = 28, m = 1, s = 1
    I think I understand this. First, a SIMDMixer is created (by the MixerFactory, which is called from the ContextModel). That SIMDMixer has 72368 neural networks, each of which has 1406 inputs. Of the 72368 networks, up to 28 may be "selected"[0]. The constructor of SIMDMixer calls the constructor of Mixer, and sets n to be a number that is a multiple of the simdWidth, which is why we see n = 1408 on the first line, not 1406. (1408 is evenly divisible by 32). This first mixer then creates a second mixer that has just one neural network, with 28 inputs. That second mixer is used to combine the outputs from the 28 "selected" neural networks in the first mixer.

    Questions:
    [0]: What does it mean to be "selected"? Does it mean that only those networks are used for prediction, and only those networks are trained?
    [1]: Does the number of neural networks correspond to the number of contexts that are being modeled?
    [2]: Different neural networks are allowed to use different numbers of inputs, right? n = 1406 is just an upper limit, right?

    ​I have more questions but I'll save them for the next post.

  15. #13
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,977
    Thanks
    296
    Thanked 1,305 Times in 741 Posts
    > That SIMDMixer has 72368 neural networks, each of which has 1406 inputs. Of
    > the 72368 networks, up to 28 may be "selected"

    To start, you have to understand that paq is not really NN-based.
    Matt wanted it to be, and there's some similarity, but in the end mixing
    is mixing and NN is NN.

    1) "72368 neural networks" is actually "72368 neurons":
    https://en.wikipedia.org/wiki/Artifi...asic_structure
    but I'd just say "72386 weight arrays".

    2) "1406 inputs" are the same for each of weight arrays.

    3) There's a concept of "context" which isn't used with the same meaning
    (connecting parts of NN based on data) in NN theory.

    On other hand, PPM/CM existed without any NN theory and logistic mixing,
    and I think it provides a better foundation for PAQ too.
    Since NN theory doesn't provide us "physical meaning" of weights,
    or different implementations of mixing.

    So my explanation of paq mixer class would be like this:
    1) Logistic mixing provides a way to take n submodel predictions
    and n corresponding weights, and compute a single mixed prediction,
    where each submodel input contributes with given weight.
    2) Sometimes it makes sense to use different weight arrays in different contexts,
    even with inputs always supplied by same submodels.
    For example, increasing the weight of submodel giving a good prediction for bit7
    (which is always 0) usually isn't a good idea for predicting bit6 after that.
    3) Sometimes we also don't know when and which mixer context is good.
    In this case we can mix multiple predictions from same inputs
    with different mixer contexts, then mix these predictions again,
    with weights that represent the performance of these different contexts.

    PAQ Mixer class implements the idea described above -
    n is number of submodel predictions on input (and also size of basic weight array),
    m is the total number of contexts in all layer1 mixers
    s is the number of layer1 mixers (each with its own context),
    which produce layer1 mixed predictions, which are fed to layer2 mixer.

    > Questions:
    > [0]: What does it mean to be "selected"? Does it mean that only those
    > networks are used for prediction, and only those networks are trained?

    Yes, mixing is attempted with s different contexts,
    only weight arrays corresponding to contexts are updated.

    > [1]: Does the number of neural networks correspond to the number of contexts that are being modeled?

    There're different kinds of contexts here - normally we talk about contexts for data statistics.
    Mixer instance receives abstract predictions and has its own context sets.

    As example:
    // we have 2 inputs: p1,p2
    p3 = Mixer1[o1_ctx].mix(p1,p2);
    p4 = Mixer2[o2_ctx].mix(p1,p2);
    p = Mixer3.mix(p3,p4);
    DIM(x)=sizeof(x)/sizeof(x[0]);
    m = DIM(Mixer1)+DIM(Mixer2);


    To answer the question, the number of weight arrays
    (which are called "neural networks" in paq source for some reason)
    does correspond to the total number of contexts in all mixing sets.
    But this doesn't have any direct relation to contexts of mixer inputs.

    > [2]: Different neural networks are allowed to use different numbers of inputs, right?
    > n = 1406 is just an upper limit, right?

    Well, since they're not really "neural networks"...
    Same inputs are used with all weight arrays, so the number of inputs
    and the number of weights should be always the same.

    Only number of contexts in mixing sets can be different.

  16. #14
    Member
    Join Date
    Aug 2017
    Location
    United States
    Posts
    33
    Thanks
    19
    Thanked 23 Times in 16 Posts
    @Shelwien: Thanks for the detailed reply. I need some time to digest it and understand what you wrote. In the meantime, I believe I fixed the remaining compression regression. Would you be so kind as to test the fix that I pushed to the master branch of my fork?

  17. #15
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,977
    Thanks
    296
    Thanked 1,305 Times in 741 Posts
    Size seems to be correct this time, but there're issues with non-AVX2 clang.
    First clang didn't want to compile the AVX2 code in Mixer.hpp.
    Then, after some patches, it compiled, but tries to use AVX2 mixer, which was disabled for SSE4 target.

    Also attached archive with windows binaries from 3 different compilers for testing.
    Attached Files Attached Files

  18. #16
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,217
    Thanks
    743
    Thanked 495 Times in 383 Posts
    Quote Originally Posted by Shelwien View Post
    I got repository code to compile on windows with some patches.
    Surprisingly clang was the hardest to compile and in the end clang version doesn't work (crashes).
    There was some really weird stuff like #include <...> within a function.
    Also somehow now it doesn't compile without AVX2 and C++17 - I think its a bad development, since original worked with C++98 and no vectors at all.
    (Also x86 is not the only cpu).

    But the main problem is this:
    Code:
    book.paq8px183fix1 -7 184531
    book.paq8px184 -7     185858
    I have two small questions about this above:

    1. Is this paq8px184 version official or available or is it only renamed some working paq8pxe version?
    2. Is the paq8pxe_v1 version need dictionary files from paq8px?

  19. #17
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,977
    Thanks
    296
    Thanked 1,305 Times in 741 Posts
    Andrew Epstein (@Eppie) did some major refactoring of paq8px183fix1 here: https://github.com/andrew-epstein/paq8px
    Jan Ondrus maintains the main repository of paq8px here: https://github.com/hxim/paq8px
    He seemingly accepted the recent changes, but there's a significant delay for new commits.

    Eppie's version still has some issues (like worse compression of book1) which are getting fixed in epstein repository,
    but are not immediately mirrored by hxim repository.

    > 1. Is this paq8px184 version official or available or is it only renamed some working paq8pxe version?

    Since Jan Ondrus accepted it, yes, it should be official.

    > 2. Is the paq8pxe_v1 version need dictionary files from paq8px?

    "pxe" is my naming for px184 patches. We still don't have a stable 184.

    Yes, it does need dictionary files from 183.

  20. Thanks (2):

    Darek (5th February 2020),Eppie (5th February 2020)

Similar Threads

  1. ULTRACompress (Still using PAQ)
    By moisesmcardona in forum Data Compression
    Replies: 5
    Last Post: 9th May 2019, 21:10
  2. understanding of deflate
    By Defplus in forum Data Compression
    Replies: 5
    Last Post: 14th July 2017, 20:48
  3. Help understanding Recompression
    By rexferal0009 in forum Data Compression
    Replies: 11
    Last Post: 28th November 2016, 09:16
  4. Multiline PAQ
    By kampaster in forum Data Compression
    Replies: 2
    Last Post: 5th August 2011, 17:25
  5. Asymmetric PAQ
    By kampaster in forum Data Compression
    Replies: 11
    Last Post: 27th August 2010, 04:16

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •