Results 1 to 4 of 4

Thread: Improving CM state machines

  1. #1
    Member
    Join Date
    Jun 2013
    Location
    Canada
    Posts
    36
    Thanks
    24
    Thanked 47 Times in 14 Posts

    Improving CM state machines

    Has anybody else done experiments to improve CM state machines based on data? I ran the lpaq3 state machine on a basic algorithm that would randomly change transitions and keep the change if compression improved on enwik6. This state machine is very similar to the current MCM one. The new state machine had the following improvements when inserted into lpaq3:

    enwik8:
    sfc: 10,624,555 -> 10,601,687
    enwik8: 19,122kB -> 19,057kB

    State machine attached, states are sorted in depth first order. Discuss.
    Attached Files Attached Files

  2. Thanks (2):

    encode (30th June 2013),Nania Francesco (29th June 2013)

  3. #2
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    I tried the States and there are actually improvements of 1-2% Although the match (LZP) would seem to have a minor effect on file with high entropy.
    I think it is convenient to all of us that use this compression CM system set for each file type has its own specific entropy a table type and doing the accounts would not affect much on the weight of the program! For example if we set 16 types of table at the end would have been 16 * 512! not bad as I use lpaq
    Fantastic job!
    Last edited by Nania Francesco; 29th June 2013 at 15:55.

  4. #3
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    I wrote a program to analyze the state table (code below). For each state, it lists the 10 shortest bit sequences (up to 28 bits) that lead to that state as well as the shortest path. For some states, there are no sequences shorter than 28. These are states 29-52 consisting of runs of up to 50 zero bits, and states 53, 235, 236 consisting of long runs of zero bits followed by a 1. Curiously, there are no states to represent a sequence longer than 14 ones.

    The output is rather long, so I will just post the code. It takes about 10 seconds to run.

    Code:
    // Public domain
    #include <stdio.h>
    #include <stdlib.h>
    #include <vector>
    using namespace std;
    
    static const unsigned char st[256][2]=
    {{2,168} ,{2,168}  ,{3,163}  ,{4,163}  ,{5,165}  ,{6,89}   ,{7,245}  ,{8,217}  ,{9,245}  ,{10,245} , // 0-9
    {11,233} ,{12,244} ,{13,227} ,{14,74}  ,{15,221} ,{16,221} ,{17,218} ,{18,226} ,{19,243} ,{20,218} , // 10
    {21,238} ,{22,242} ,{23,74}  ,{24,238} ,{25,241} ,{26,240} ,{27,239} ,{28,224} ,{29,225} ,{30,221} , // 20
    {31,232} ,{32,72}  ,{33,224} ,{34,228} ,{35,223} ,{36,225} ,{37,238} ,{38,73}  ,{39,167} ,{40,76}  , // 30
    {41,237} ,{42,234} ,{43,231} ,{44,72}  ,{45,31}  ,{46,63}  ,{47,225} ,{48,237} ,{49,236} ,{50,235} , // 40
    {51,53}  ,{52,234} ,{47,53}  ,{54,234} ,{55,229} ,{56,219} ,{57,229} ,{58,233} ,{59,232} ,{60,228} , // 50
    {61,226} ,{62,72}  ,{63,74}  ,{64,222} ,{65,75}  ,{66,220} ,{67,167} ,{68,57}  ,{69,218} ,{6,70}   , // 60
    {71,1}   ,{71,72}  ,{71,73}  ,{61,74}  ,{75,217} ,{56,76}  ,{77,167} ,{78,79}  ,{77,79}  ,{80,166} , // 70
    {81,162} ,{82,162} ,{83,162} ,{84,162} ,{85,165} ,{86,89}  ,{87,89}  ,{88,165} ,{77,89}  ,{90,162} , // 80
    {91,93}  ,{92,93}  ,{80,93}  ,{94,161} ,{95,100} ,{96,93}  ,{97,93}  ,{98,93}  ,{99,93}  ,{90,93}  , // 90
    {101,161},{94,102} ,{103,120},{101,104},{102,105},{104,106},{107,108},{104,106},{105,109},{108,110}, // 100
    {111,160},{112,134},{113,108},{114,108},{115,126},{116,117},{92,117} ,{118,121},{94,119} ,{103,120}, // 110
    {119,107},{122,124},{123,117},{94,117} ,{113,125},{126,127},{113,124},{128,139},{129,130},{114,124}, // 120
    {131,133},{132,109},{112,110},{134,135},{111,110},{134,136},{110,137},{134,138},{134,127},{128,140}, // 130
    {128,141},{142,145},{143,144},{115,124},{113,125},{142,146},{128,147},{148,151},{149,125},{79,150} , // 140
    {148,127},{142,152},{148,153},{150,154},{155,156},{149,139},{157,158},{149,139},{159,156},{149,139}, // 150
    {131,130},{101,117},{98,163} ,{115,164},{114,141},{91,163} ,{79,147} ,{58,168} ,{143,169},{170,199}, // 160
    {129,171},{128,172},{110,173},{174,177},{128,175},{176,171},{129,171},{174,178},{179,180},{174,172}, // 170
    {176,181},{141,182},{157,183},{179,184},{185,186},{157,178},{187,189},{188,181},{1,181}  ,{151,190}, // 180
    {191,193},{192,182},{188,182},{187,194},{172,195},{175,196},{170,197},{152,198},{185,169},{170,200}, // 190
    {176,201},{170,202},{203,204},{148,180},{185,205},{203,206},{185,207},{192,208},{209,210},{188,194}, // 200
    {211,212},{192,184},{213,215},{214,193},{188,184},{216,208},{1,193}  ,{84,163} ,{54,219} ,{54,1}   , // 210
    {221,94} ,{54,217} ,{55,223} ,{85,224} ,{69,225} ,{63,76}  ,{56,227} ,{86,217} ,{58,229} ,{230,219}, // 220
    {231,79} ,{57,86}  ,{229,165},{56,217} ,{224,214},{54,225} ,{54,216} ,{66,216} ,{58,234} ,{54,75}  , // 230
    {61,214} ,{57,237} ,{222,74} , {78,74} ,{85,163} ,{82,217} ,{0,0}    ,{0,0}    ,{0,0}    ,{0,0}    , // 240
    {0,0}    ,{0,0}    ,{0,0}    ,{0,0}    ,{0,0}    ,{0,0}};                                            // 250
    
    vector<int> v[256];
    
    int main() {
    
      // Calculate final state for all sequences of up to 28 bits. Save first 10 per state.
      bool done=false;
      for (int i=0; i<28 && !done; ++i) {
        done=true;
        for (int j=0; j<(1<<i); ++j) {
          int s=0;
          for (int k=0; k<i; ++k)
            s=st[s][j>>k&1];
          if (v[s].size()<10) {
            v[s].push_back(1<<i|j);
            done=false;
          }
        }
      }
    
      // Calculate transitions neede to reach each state
      int dist[256]={0};
      dist[0]=1;
      for (int i=1; i<=256; ++i) {
        for (int j=0; j<256; ++j) {
          if (dist[j]==i) {
            if (dist[st[j][0]]==0) dist[st[j][0]]=i+1;
            if (dist[st[j][1]]==0) dist[st[j][1]]=i+1;
          }
        }
      }
    
      // Print sequences by state
      for (int i=0; i<256; ++i) {
        if (dist[i])
          printf("State %d reachable in %d steps\n", i, dist[i]-1);
        else
          printf("State %d not reachable\n", i);
        for (int j=0; j<int(v[i].size()); ++j) {
          printf("%3d: ", i);
          unsigned x=v[i][j];
          while (x>1) {
            printf("%d", x&1);
            x>>=1;
          }
          printf("\n");
        }
        printf("\n");
      }
      return 0;
    }

  5. Thanks (3):

    Jan Ondrus (1st July 2013),Mat Chartier (1st July 2013),Nania Francesco (1st July 2013)

  6. #4
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    I guess it's not so much the analysis that serves as the number of 1 and 0 representing the States as in Paq8 to calculate then the probability

Similar Threads

  1. Improving LZ78/LZW?
    By RichSelian in forum Data Compression
    Replies: 7
    Last Post: 19th September 2011, 19:05
  2. State machines
    By encode in forum Data Compression
    Replies: 3
    Last Post: 3rd November 2010, 22:58
  3. Improving RC4 (MC1 cipher proposal)
    By encode in forum The Off-Topic Lounge
    Replies: 14
    Last Post: 5th August 2010, 21:57
  4. Virtual test-machines
    By Vacon in forum Data Compression
    Replies: 7
    Last Post: 15th April 2009, 00:19
  5. Lines are blurring between humans and machines
    By encode in forum Forum Archive
    Replies: 0
    Last Post: 27th May 2007, 21:20

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •