Page 1 of 2 12 LastLast
Results 1 to 30 of 35

Thread: fcm1 - open source order-1 cm encoder

  1. #1
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts

    Talking fcm1 - open source order-1 cm encoder

    OK, let me introduce some encoder. Continuing fpaq0f2-like, experimental encoders, I made a new one - check out my variant of an order-1 cm. Source code included:
    fcm1.zip


  2. #2
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts

    Cool

    Brief description:

    Each counter consists from a two states a la FPAQ0P - fast and slow one, which are summarized together.

    For mixing a probabilities from order-1 and order-0 we use followed formula:

    ((p2*3)+p1)/4

    p2 - Probability from an order-1 context
    p1 - Probability from an order-0 context

    If:
    p2 = 0.5
    i.e. order-1 context for the first time occurs, we use probability from an order-0 only.

    Any thoughts?

    EDIT:
    In final version, we use:
    ((p2*15)+p1)/16

  3. #3
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts

    Thumbs up

    Quote Originally Posted by encode View Post
    OK, let me introduce some encoder. Continuing fpaq0f2-like, experimental encoders, I made a new one - check out my variant of an order-1 cm. Source code included:
    fcm1.zip

    Thanks Ilia! I have uploaded it to my site along with my own speed optimised compile.

    Mirror: Download

  4. #4
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    One can't generalize this, but you give far too much weight to order 0. The 2d SSE visualizations (SSE for mixing) in the old form showed that the SSE function mostly looked like a plane, which was dominated by the order 1 model. That means, order 0 gave almost no contribution.
    How much does your slow counter/fast counter mix improve over a standalone counter?

  5. #5
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    Quote Originally Posted by toffer View Post
    How much does your slow counter/fast counter mix improve over a standalone counter?
    Notable difference... You may check it for yourself!

  6. #6
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    Updated archive with a new compile which is at almost 2X times faster!

  7. #7
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hm as i said giving order 0 such a high weight is totally wrong:

    -> 22.459.785 bytes (plain order1) vs. 23.783.905 bytes (orders01) on SFC

    I just changed this:

    Code:
    //  int P() {
    //    int p1=counter0[c0].P();
    //    int p2=counter1[c1][c0].P();
    //
    //    return (p2!=(1<<16)?((p2*3)+p1)>>7:p1>>5);
    //  }
    
      int P() {
        return counter1[c1][c0].P()>>5;
      }

  8. #8
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    I'm not so sure. It's a matter of tuning for specific file type and specific situation...

    Static weights form PAQ1 are: 4, 9, 16, ...

    sfc.7z:
    p2*3: 23,789,646 bytes
    p2*7: 22,869,851 bytes
    p2*15: 22,535,420 bytes
    plain order-1: 22,487,664 bytes
    p2*31: 22,425,582 bytes
    p2*63: 22,403,145 bytes
    ...


  9. #9
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    ATM you exactly confirm what i said - "order 0 such a high weight is totally wrong" and your results get better the higher you weighted order 1...

  10. #10
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    Actually, it's true only for such "simple" and light version. BALZ uses similar but more heavy weight and complex CM encoder, and there weight=3 is the best so far. Also, for such simple model we may try to tune counters speed - i.e. not 3/5 but 3/6, 4/6, etc...

  11. #11
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Counter tuning is not very exciting, believe me - i'm doing it the whole time :/ But you can really squeeze out additional performance. As i said the gain is somewhere between .1 - 1 % for my counter in its simplest form (over ">>x-counters").

    If you refer to "shift right by 3 or 5 bits", well, that's to croase. Simply changing the weight of old statistics from 7/8 (=shr 3) to, e.g. 15/16 (=shr 4) (these are fractions) leaves several quantization levels untouched.

    However good results will require at least one multiplication per bit model adaption (see my fpaq0 variant), which will make things slower.

    EDIT: BTW, is your weighting in BALZ static?

  12. #12
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    Quote Originally Posted by toffer View Post
    As i said the gain is somewhere between .1 - 1 % for my counter in its simplest form (over ">>x-counters").
    What is YOUR counter? How it looks like?

    Quote Originally Posted by toffer View Post
    However good results will require at least one multiplication per bit model adaption (see my fpaq0 variant), which will make things slower.
    Can you give exact link - looks like there is no your version at Matt's site!

    Quote Originally Posted by toffer View Post
    EDIT: BTW, is your weighting in BALZ static?
    Yes, of course!

  13. #13
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Have a look at the parameter optimization thread:

    http://encode.su/forum/showthread.php?t=18

    EDIT: meanwhile i found better parameters for enwik8; since i use an iterative optimization the optimal result depends on the initial estimate. If you want, i can post them next week (left at university, i'm at home right now)
    Last edited by toffer; 23rd May 2008 at 23:42.

  14. #14
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by encode View Post
    Updated archive with a new compile which is at almost 2X times faster!
    Quote Originally Posted by toffer View Post
    Hm as i said giving order 0 such a high weight is totally wrong:

    -> 22.459.785 bytes (plain order1) vs. 23.783.905 bytes (orders01) on SFC

    I just changed this:

    Code:
    //  int P() {
    //    int p1=counter0[c0].P();
    //    int p2=counter1[c1][c0].P();
    //
    //    return (p2!=(1<<16)?((p2*3)+p1)>>7:p1>>5);
    //  }
    
      int P() {
        return counter1[c1][c0].P()>>5;
      }
    Thanks Ilia, and toffer! I have updated my archive.

    Mirror: Download

  15. #15
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    And as a final update - I again updated the archive - this time I set weights tuned for enwik8. For enwik the best value is 15.

  16. #16
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    I'll wait a while before I update my archive again!

  17. #17
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts

  18. #18
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    Quote Originally Posted by toffer View Post
    Have a look at the parameter optimization thread:

    http://encode.su/forum/showthread.php?t=18

    EDIT: meanwhile i found better parameters for enwik8; since i use an iterative optimization the optimal result depends on the initial estimate. If you want, i can post them next week (left at university, i'm at home right now)
    OK, thanks!

  19. #19
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    I have updated my archive again. All files are included.

    Mirror: Download

  20. #20
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts

  21. #21
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts

    Order-1 fpaq0f2

    I modified fpaq0f2 to pure order 1.

    enwik8 43,898,745, 45 sec, 46 sec
    enwik9 413,824,018, 491 sec

    (compare with fcm1: 45,402,225, 23, 26 sec; 447,305,681, 228, 261 sec).

    Predictor code is below. I didn't post the whole program but you can just paste this into fpaq0f2.cpp. Rest of program is unchanged.

    Recall that fpaq0f2 maps each context to an 8 bit history, them maps that along with the context to a prediction using a StateMap. To make it order 1, the StateMap size is increased to 2^24 (8 bit history, 16 bit context) using 64 MB memory. This makes it twice as slow due to cache misses. The only other change was to tune the StateMap limit to 255 (was 95).

    Yeah, I know, is it really order 1? I think so because there are 64K counters and I am modeling whether each one is stationary or nonstationary.

    Code:
    class Predictor {
      int cxt;  // Context: 0=not EOF, 1..255=last 0-7 bits with a leading 1
      int c1;   // last byte << 8
      StateMap sm;
      unsigned char state[0x10000];
    public:
      Predictor();
    
      // Assume stream of 9-bit symbols
      int p() {
        return sm.p(c1+cxt<<8|state[c1+cxt]);
      }
    
      void update(int y) {
        sm.update(y, 255);
        (state[c1+cxt]*=2)+=y;
        if ((cxt+=cxt+y) >= 256) {
          c1=cxt<<8&0xff00;
          cxt=0;
        }
      }
    };
    
    Predictor::Predictor(): cxt(0), c1(0), sm(0x1000000) {
      for (int i=0; i<0x10000; ++i)
        state[i]=0x66;
    }

  22. #22
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    Thanks a lot Matt!

    The only purpose of fcm1 is to test the idea of mixing the probabilities from different orders straight, without bit counts, etc. As you can see it has a nice compression/speed balance since it is still faster than initial fpaq0.

  23. #23
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,239
    Thanks
    192
    Thanked 968 Times in 501 Posts
    Why don't you add some blockwise optimization there?
    I mean, you can store the probability sets for eg. 64k blocks,
    then find the optimal weights and store them in the block header.
    Like that, decoding would have the same speed as now, but
    compression might significantly improve.

  24. #24
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    Quote Originally Posted by Shelwien View Post
    Why don't you add some blockwise optimization there?
    I mean, you can store the probability sets for eg. 64k blocks,
    then find the optimal weights and store them in the block header.
    Like that, decoding would have the same speed as now, but
    compression might significantly improve.
    Well, it's about another idea. Here I just testing some thoughts...
    The second thing I will try is an adaptive weighing - i.e. each time check what kind of order was more correct - i.e.:

    During update:
    Code:
    if (y) {
      if (p2>p1)
        wt2++;
      else
        wt1++;
    }
    else {
      if (p2<p1)
        wt2++;
      else
        wt1++;
    }
    A dummy example from the top of my head. Of course weighs may be context sensitive etc. Simple and fast approach.

  25. #25
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,239
    Thanks
    192
    Thanked 968 Times in 501 Posts
    Such statistics (number of occurences when context's prediction was the best) are used in ppmy/ash where they're called "optimals".
    So, I found that its another useful value for SSE context (along with escape counts etc), but even an optimized weighting function based on these values
    is worse than a proper adaptive mixer... and slower too.
    But still you may check out the weighting function in ash's source.

  26. #26
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    I doubt about that is slow. Check this out:

    Code:
    // wt is our weigh for counters balance - ranging from 1 to 63, if we use 64 points
    
    int w1=64-wt;
    int w2=wt;
    
    return (((p1*w1)+(p2*w2))>>X);

  27. #27
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,239
    Thanks
    192
    Thanked 968 Times in 501 Posts
    Well, I meant _this_ is slow:
    Code:
    inline float Weighten(
      Node& p, // context node
      int Ord, // context order
      int N    // symbols in context
    ) {
      int& CurOrder = Order; // model order
      int i = __min(CurOrder,MY-1); // table limits
      int j = __min(Ord,MX-1);
      int k = N==1; // determinated context?
      float WEsc,WOpt,WInv;
      WOpt = WO[k][i-1][j]; // static parameters
      WEsc = WE[k][i-1][j]; // WA[] and WB[] are the same
      WInv = WA[k][i][j]*p.W + WB[k][i][j]; // p.W/p.T is escape probability
      WInv*= (WOpt+p.O); // p.O/p.Q is best prediction probability
      WInv/= (WEsc+p.Q)*(WOpt+p.T);
      if( WInv<0 ) WInv=0;
      return WInv;
    }

  28. #28
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    Briefly tested the idea:

    Code:
    #define WSCALE 128
    // ...
    
      // initially wt may be initialize to WSCALE/4 or so.
      TPredictor(): wt(WSCALE/4), c0(1), c1(0) {}
    
      int P() {
        int p1=counter0[c0].P();
        int p2=counter1[c1][c0].P();
    
        int w1=WSCALE-wt;
        int w2=wt;
    
        return (((p1*w1)+(p2*w2))>>12);
      }
    
      void Update(int y) {
        int p1=counter0[c0].P();
        int p2=counter1[c1][c0].P();
    
        if (y) {
          if (p2>=p1)
            wt+=(wt<(WSCALE-1));
          else
            wt-=(wt>(WSCALE/4));
        }
        else {
          if (p2<=p1)
            wt+=(wt<(WSCALE-1));
          else
            wt-=(wt>(WSCALE/4));
        }
    
        counter0[c0].Add(y);
        counter1[c1][c0].Add(y);
    
        if ((c0+=(c0+y))>=256) {
          c1=(unsigned char)c0;
          c0=1;
        }
      }
    // ...
    sfc.7z:
    fcm1: 22,535,420 bytes
    fcm1w: 22,246,539 bytes

    On ENWIKs this trick gives no improvement, since it's about stationary nature and order-1 working pretty correct in this case. Well, maybe if we will use weight with some context...

  29. #29
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    Damn, of course for weight context we SHOULD use at least c0 i.e. bits of a current byte!

    sfc.7z:
    22,181,086 bytes


  30. #30
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    And with 64 points, sfc.7z:
    22,164,766 bytes

    Now we may separate the entire mixing stuff in a TMixer class... Also I will do some experiments with weight update.

Page 1 of 2 12 LastLast

Similar Threads

  1. Replies: 23
    Last Post: 24th March 2018, 17:57
  2. BALZ - An Open-Source ROLZ-based compressor
    By encode in forum Data Compression
    Replies: 60
    Last Post: 6th March 2015, 16:47
  3. PeaZip - open source archiver
    By squxe in forum Data Compression
    Replies: 1
    Last Post: 3rd December 2009, 21:01
  4. New fast open-source paq-based jpeg compressor
    By Bulat Ziganshin in forum Forum Archive
    Replies: 14
    Last Post: 13th September 2007, 13:57
  5. Replies: 0
    Last Post: 26th July 2007, 18:47

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •