Page 1 of 2 12 LastLast
Results 1 to 30 of 32

Thread: fpaq0f new order 0 model

  1. #1
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    I have written a new order 0 model for fpaq0f that significantly improves compression. http://cs.fit.edu/~mmahoney/compression/text.html#6189

    Compressor enwik8 enwik9 Comp Decomp
    fpaq0mw 61,271,869 618,959,309 455 457 (old record holder)
    fpaq0f 58,088,230 581,053,251 265 251

    I left assertion checks on. -DNDEBUG should be about 10% faster.

    The main improvement is to include a bit history (as an 8 bit PAQ state) in each context for a total of 16 bits of context (that depends only on the current byte). This is mapped using a StateMap to the bit prediction and then arithmetic coded. There are some additional minor improvements:

    - the probability is coded as 16 bits instead of 12.
    - the StateMap probability precision is increased from 22 bits to 23.
    - reciprocal table is tuned (i -> 16K/(i+2.5) instead of 1.5).
    - arithmetic coder does a 32x16 bit multiply (no bits discarded before the multiply, as with PAQ).
    - EOF marker is inverted to simplify context update (always 0..255 even though 9 bits are used).

    I was able to improve the compression further to enwik8=58,011,000 by mixing the combined context with two smaller contexts (prior bits only, bit history given prior bits only) with weights 14, 1, 1 respectively and speeding adaption of the combined context from 1/127 to 1/72. But this made it twice as slow, so I left it out.

    I need to try this out in a BWT compressor or something.

  2. #2
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Thanks Matt!

  3. #3
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,340
    Thanks
    211
    Thanked 1,009 Times in 534 Posts
    My comparison: http://shelwien.googlepages.com/fpaq0f.htm
    IntelC binaries: http://shelwien.googlepages.com/fpaq0_bin.rar

    Nice, but I'm still not sure that its really order0

  4. #4
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,340
    Thanks
    211
    Thanked 1,009 Times in 534 Posts
    Well, now I'm sure that its not order0.
    As order0 with adaptive secondary model is not comparable to normal order0.

  5. #5
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by Shelwien
    Thanks Shelwien!

  6. #6
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Let me ask a question. I'm digging some SSE / APM stuff. Actually I didn't understand it completely. Can you give a briefing about modelling LZ match flag bit stream?

    My coded match flags stream like that:
    Code:
    0000000100100001
    Zero (0) indicates literal, one (1) indicates matches. So there isn't any periodical stuff like 8 bits or more which was widely choosen. Is there a good way for modeling like that single bit stream? My problem isn't only about match flags. Also, my compressor making some segmentation to match lengths based on their lengths that requires additional 2 different bit stream like that. I'm sure these fields can be highly compressible. But, how?
    BIT Archiver homepage: www.osmanturan.com

  7. #7
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    You can use the same type of modeling I did in fpaq0f but with 1 context instead of 256. Use the bit stream as input to a bit history state, then map that to a prediction using the StateMap.

  8. #8
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    Quote Originally Posted by Shelwien
    Well, now Im sure that its not order0.
    As order0 with adaptive secondary model is not comparable to normal order0.
    Ah, but what about a context mixing order 0 where you mix a fast and slow updating probability? I am just doing the same thing on a finer scale.

  9. #9
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,340
    Thanks
    211
    Thanked 1,009 Times in 534 Posts
    >> Well, now I'm sure that its not order0.
    >> As order0 with adaptive secondary model is not comparable to normal order0.
    >
    > Ah, but what about a context mixing order 0 where you mix a fast and slow
    > updating probability? I am just doing the same thing on a finer scale.

    You're kinda right in a sense, but the weight in a single adaptive mixer
    stores much less than a bit of context, and your map seemingly manages to store
    at least a few bits.

    Well... I also tried to get some proof of similarity between order1 and fpaq0f
    using this - http://shelwien.googlepages.com/bitrev.rar
    But got these results:
    http://shelwien.googlepages.com/fpaq0f_r.htm
    And now not sure about their interpretation

  10. #10
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    It's like A.R.'s trick in lpaq8 where byte values 32 (space) and 31 are swapped. It shouldn't help compression, but it does.

  11. #11
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    I made some more improvements: fpaq0f2:
    http://cs.fit.edu/~mmahoney/compression/text.html#6189

    Code:
     
    Compressor    enwik8      enwik9    Comp  Decomp 
    fpaq0mw     61,271,869  618,959,309  455   457 
    fpaq0f      58,088,230  581,053,251  265   251 
    fpaq0f2     56,916,872  558,645,708  222   207

    The main improvement was to replace that bit history state table with a simple 8 bit history consisting of just the last 8 bits. I also made some minor improvements: 24 bit precision in the StateMap (instead of 23), tuned the update rate, initialized the StateMap probabilities according to the number of 1 bits in the context, and initialized the context count to 6 instead of 0. The new version is faster because there is no state table lookup and because I compiled with assertions off.

    It is possible to improve compression for large files by using longer bit histories, but it is slower because the StateMap won't fit in cache. I also experimented with more complex bit histories like using a counter for long runs of zeros or ones, but it didn't help any. I tuned the update rate to 90 on enwik8, but smaller values (faster updates) like around 50-60 would compress better on less stationary data like calgary.tar.

  12. #12
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    Very Very Nice! Hi Matt!

  13. #13
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,340
    Thanks
    211
    Thanked 1,009 Times in 534 Posts
    http://shelwien.googlepages.com/fpaq0f2.htm

    Make it a proper order1 already

  14. #14
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Matt Mahoney
    I also experimented with more complex bit histories like using a counter for long runs of zeros or ones, but it didnt help any.
    Arent long runs already modeled well enough by your state machine?
    M1, CMM and other resources - http://sites.google.com/site/toffer86/ or toffer.tk

  15. #15
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    Quote Originally Posted by Shelwien
    http://shelwien.googlepages.com/fpaq0f2.htm

    Make it a proper order1 already

    hmm, what about input like that:

    qqwweerrttyyuuiioopp

    normal order- n coder wont compress it (where n can be 0, 1, 2 or anything) but fpaq0f should deal with it nicely (up to 50 % compression in such cases) thanks to sse (bit history).

    sse will figure out that values that appeared 2 * n times already arent compressible but other will be just a repeated values (from previous place).


    matt:

    instead of:
    <div class=""jscript""><pre>
    Predictor::Predictor(): cxt(0), sm(0x10000) {
    for (int i=0; i<0x100; ++i)
    state[i]=0x66;
    }
    [/code]


    you can use:
    <div class=""jscript""><pre>
    Predictor::Predictor(): cxt(0), sm(0x10000) {
    for (int i=0; i<0x100; ++i)
    state[i]=1;
    }
    [/code]


    and instead of:
    <div class=""jscript""><pre>
    int& st=state[cxt];
    (st+=st+y)&=255;
    [/code]


    you can use:
    <div class=""jscript""><pre>
    int& st=state[cxt];
    st+=st+y;
    if (st >= 256)
    {
    st &= 255;
    st |= 128;
    }
    [/code]


    that will reduce bit history length to 7 bits but that will improve compression on small files (at least it did when i used that in my tarsalzp, but my tarsalzp does better when using statemaps so i dont know if it will help in this case).

  16. #16
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    Quote Originally Posted by Shelwien
    Make it a proper order1 already
    fpaq0 (stationary model) is the only proper order 0

    Quote Originally Posted by donkey7
    that will reduce bit history length to 7 bits but that will improve compression on small files (at least it did when i used that in my tarsalzp, but my tarsalzp does better when using statemaps so i dont know if it will help in this case).
    Using smaller bit histories works on smaller files in general. For enwik8 I think around 9 or 10 bits was optimal. Also, with your method you have 32K contexts (all those shorter than 7 bits) that are reached only once, so it is useless to model them. Thats why a simple 8 bit history works better than the paq8 state table. If you draw a 2-D map of the states (0 and 1 counts for the x,y axes) then it has the shape of a hyperbola with boundary (0-count)*(1-count) = constant. All the states below the boundary are reached at most once so there is no point in modeling them.

    Modeling short bit histories works much better in higher order contexts or with context mixing where the short histories are modeled many times.

  17. #17
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,340
    Thanks
    211
    Thanked 1,009 Times in 534 Posts
    >> Make it a proper order1 already
    >
    > fpaq0 (stationary model) is the only proper order 0

    I said _order1_ there. And meant that fpaq0f2 is close to plain
    order1 coder, but worse and slower.

  18. #18
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    Fpaq0f2 is Order0 or Order1 coder?

  19. #19
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    Quote Originally Posted by Matt Mahoney
    Also, with your method you have 32K contexts (all those shorter than 7 bits) that are reached only once, so it is useless to model them.
    oh, i forgot that i tested it with lzp. i have millions of lzp entries which are often replaced with new entries. so most often we have short histories (for less frequent and less reliable contexts) or long runs (for more frequent more reliable contexts). and sse input in my program is 8 bit of state and 3 bits of flag bit history - so theres only 2048 sse entries and sse adapts fairly fast.

    Quote Originally Posted by Shelwien
    I said _order1_ there. And meant that fpaq0f2 is close to plain
    order1 coder, but worse and slower.
    are you sure?

    try to make an experiment: swap every pair of bytes and the compress.

    ie:

    <div class=""jscript""><pre>
    for (i = 0; i < n; i +=2)
    swap(a[i], a[i + 1]);
    [/code]


    or slightly permute the input. ie:
    <div class=""jscript""><pre>
    for (i = 0; i < n; i++)
    swap(a[i], a[i - rand()%3]);
    [/code]


    that should degrade performance more in order- 1 than in fpaq0f.

  20. #20
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    It does degrade compression in fpaq0f2

    enwik8 = 56,916,872
    byte pairs swapped = 57,852,811

  21. #21
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    can you test also: http://shelwien.googlepages.com/o1plain.exe ?

    i know that swapping slightly damages bit histories, but damaging context in order- 1 coder should hurt much more.

  22. #22
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,340
    Thanks
    211
    Thanked 1,009 Times in 534 Posts
    Code:
     
        Codec    Datasize  Ctime   Dtime   Metric 
    o1plain-0    33089287  15.330  17.061  5356.1  original files 
    fpaq0f2-2-0  33874788  14.844  17.253  5480.3 
    o1plain-1    33310889  15.734  16.908  5389.6  bitrev 
    fpaq0f2-2-1  37000057  15.171  18.000  5976.4 
    o1plain-2    41049526  16.001  17.656  6606.5  byteswap1 (swap odd/even bytes) 
    fpaq0f2-2-2  35584499  14.859  17.233  5747.3 
    o1plain-3    41686459  16.280  17.752  6707.3  byteswap2 ("random") 
    fpaq0f2-2-3  35945529  14.843  17.265  5804.0

    http://shelwien.googlepages.com/o1plain.rar
    o1plain, byteswap1-2 source/binaries and full html comparison are in there.

    > i know that swapping slightly damages bit histories, but damaging context
    > in order-1 coder should hurt much more.

    Well, you're right, so I guess we'd have to make it a mix of both contexts
    And still I think that
    1. Byte alphabet permutations are much more common
    2. It isn't fair to compare fpaq0f with real order0 coders
    3. For BWT data fpaq0f might be really better than normal order1

  23. #23
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    fpaq0f2 IS NOT AN ORDER 1 CODER.

    it doesn't take previous byte as an additional context. instead if we are encoding n- th bit of current byte, then as a context it takes last eight bytes with same (n - 1) bit prefix as current byte (to be strict: it takes only one bit out of every such byte) - that's called bit history.


    btw:
    plain order- 0 coder should encode full byte at once, not every bit separately.

    fpaq* coders codes separately bits and this means that previous bytes with same prefix but different from current byte affects probabilities of current byte (bits of current byte). so i say that it's not normal order- 0 coding as order- 0 coder should encode full bytes indepedently.

    with normal order- 0 coder permuting small substrings (up to 20 or 200 keys - depending on adaptation speed) {of different keys} shouldn't affect compression ratio.


    edit:
    your 'random' function looks like a good joke

  24. #24
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,340
    Thanks
    211
    Thanked 1,009 Times in 534 Posts
    > fpaq0f2 *IS NOT AN ORDER 1 CODER*.

    It is, if we'd call a "order1 coder" something that uses additional
    8 bits of context beside bits in the same byte.
    And such a definition has some sense as it correlates with compression
    speed and quality.

    > plain order-0 coder should encode full byte at once, not every bit separately.

    I'd agree on that as I prefer byte coders myself, but I specially made
    these coders bit-oriented for fair comparison with fpaq0 series.
    In fact, a similar byte-oriented coder would be at least faster and
    probably better too, its certain for texts.

    > fpaq* coders codes separately bits and this means that previous bytes with same
    > prefix but different from current byte affects probabilities of current byte
    > (bits of current byte). so i say that it's not normal order- 0 coding as order-
    > 0 coder should encode full bytes indepedently.

    I still think that doesn't matter as its still possible to calculate all
    the probabilities for 0..255 values using bit model and encode the whole
    byte at once.

    > with normal order-0 coder permuting small substrings (up to 20 or 200 keys -
    > depending on adaptation speed) {of different keys} shouldn't affect compression
    > ratio.

    Don't understand that.

    > your 'random' function looks like a good joke

    Well, I obviously couldn't use rand() as it won't be repeatable
    across different builds and such, and it doesn't matter on that level
    anyway.

  25. #25
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    Quote Originally Posted by Shelwien
    Well, I obviously couldnt use rand() as it wont be repeatable
    across different builds and such, and it doesnt matter on that level
    anyway.
    it will be repeatable until you feed srand() with a random value. and it is important to use good rand.

    Quote Originally Posted by Shelwien
    I still think that doesnt matter as its still possible to calculate all
    the probabilities for 0..255 values using bit model and encode the whole
    byte at once.
    that doesnt change anything (because even the size of encoded file will be identical). just makes things unnecesarily complex. (ie. it still wont be normal order- 0 coder).

    Quote Originally Posted by Shelwien
    > with normal order-0 coder permuting small substrings (up to 20 or 200 keys -
    > depending on adaptation speed) {of different keys} shouldnt affect compression
    > ratio.
    Dont understand that.
    well, forget it its not so important.

  26. #26
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,340
    Thanks
    211
    Thanked 1,009 Times in 534 Posts
    >> Well, I obviously couldn't use rand() as it won't be repeatable
    >> across different builds and such, and it doesn't matter on that level
    >> anyway.
    >
    > it will be repeatable until you feed srand() with a random value.

    I doubt that it will be repeatable even if I add srand(0) there.
    rand() implementation is not a standard, so you won't be able
    to repeat my IntelC build's result with eg. gcc.

    > and it is important to use good 'rand'.

    I think it won't change anything here.
    But feel free to check for youself.

    >> I still think that doesn't matter as its still possible to calculate all
    >> the probabilities for 0..255 values using bit model and encode the whole
    >> byte at once.
    >
    > that doesn't change anything (because even the size of encoded file will be
    > identical). just makes things unnecesarily complex. (ie. it still won't be
    > normal order-0 coder).

    Its a terminology question so doesn't deserve an argument actually,
    but binary and byte implementations have the same computing complexity
    and compression quality, so I think they both are perfectly normal order0 coders.

    And I'd classify fpaq0f2 as a order1 coder in that sense.

  27. #27
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    well, i can't classify fpaq0f2 as order- 1 coder. it's illogical for me.

    i've experimented with using fpaq0f as second stage compression (after bwt). here's log from console:

    Code:
     
    G:Piotrek>msufsort.3.1.1 h 
    MSufSort Demo 
    Version: 3.1 *BETA* 
    Authors: Michael A. Maniscalco 
             Simon J. Puglisi 
    Date:    February 3, 2007 
     
    This is a BETA version of the algorithm. 
    Do not use except for test purposes. 
     
    Options: 
    b|B             Compute the BWT on the input file 
    u|U             Reverse the BWT on the input file 
    s|S             Compute the suffix array of the input file 
    h|H|?           Help: Usage, version info, authors etc. 
     
    Example usage: 
    MSufSort.exe b myFile.txt myFile.bwt 
    MSufSort.exe u myFile.bwt myFile.txt 
    MSufSort.exe s myFile.txt myFile.sa 
    MSufSort.exe h 
     
    G:Piotrek>msufsort.3.1.1 b enwik8 enwik8.bwt 
    Elapsed time: 177.547 seconds 
     
     
     
    G:Piotrek>fpaq0p  c enwik8.bwt enwik8.0p 
    enwik8.bwt (100000004 bytes) -> enwik8.0p (23809591 bytes) in 16.13 s. 
     
    G:Piotrek>fpaq0f  c enwik8.bwt enwik8.0f 
    enwik8.bwt (100000004 bytes) -> enwik8.0f (21563502 bytes) in 33.63 s. 
     
    G:Piotrek>fpaq0f2 c enwik8.bwt enwik8.0f2 
    enwik8.bwt (100000004 bytes) -> enwik8.0f2 (21798843 bytes) in 25.19 s. 
     
    G:Piotrek>ppmy   /o1 /m512 enwik8.bwt enwik8.1 
    PPM210 Demo v2.02(3c), Encoder (c) 2000-2002  Eugene Shelwien 
    Use /? to get some help. 
    Tree Node size = 16 
    Model MaxOrder = 1 
    DictionarySize = 512M 
    Inp=100000004 Out=63508306 M6608 O2 
    You'd just wasted 314.000s 
     
    G:Piotrek>ppmy   /o2 /m512 enwik8.bwt enwik8.2 
    PPM210 Demo v2.02(3c), Encoder (c) 2000-2002  Eugene Shelwien 
    Use /? to get some help. 
    Tree Node size = 16 
    Model MaxOrder = 2 
    DictionarySize = 512M 
    Inp=53878784 Out=22610228 M590224 O3    ^C 
    G:Piotrek>ppmy45 /o2 /m512 enwik8.bwt enwik8.245 
    PPM210 Demo v2.02(3c+sse), Encoder (c) 2000-2003  Eugene Shelwien, Serge Osnach 
    Use /? to get some help. 
    Tree Node size = 16 
    Model MaxOrder = 2 
    DictionarySize = 512M 
    Inp=11553280 Out=4921618 M333968 O3    ^C 
    G:Piotrek>ppmy70 /o2 /m512 enwik8.bwt enwik8.270 
    PPM210 Demo v2.02(3c+sse), Encoder (c) 2000-2003  Eugene Shelwien, Serge Osnach 
    Use /? to get some help. 
    Tree Node size = 16 
    Model MaxOrder = 2 
    DictionarySize = 512M 
    Inp=20140032 Out=8682405 M480640 O3    ^C 
    G:Piotrek>

    most of the time msufsort spent on disk thrashing. i'm on my mate's computer - it's sempron 2500+, 768 mb ram.

    ppmy has laughable performance here - i don't know if it's normal.

    o1plain can't run on this system - it prints that it runs only on pentium 4 processors.

    can anybody provide me good order- 1 coders and mtf and rle preprocessors for further experiments? (of course compiled and compatible with my system - they don't have to compiled with exotic switches combination).

  28. #28
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    okay tested also uhbc second stage algorithms.

    Code:
     
    G:Piotrek>uhbc 
     
    UHBC 1.0  ----   high performance blocksorting compressor   ----   BETA version 
    Copyright (c) 2002-2003 by Uwe Herklotz     All rights reserved     Jun 30 2003 
    >>>> **** FOR TESTING AND EVALUATION ONLY **** NOT FOR COMMERCIAL USE **** <<<< 
     
    Usage: UHBC command infile outfile [switches..] 
     
    Commands: --------------------------------------------------------------------- 
      e        encode infile to outfile 
      d        decode infile to outfile 
    Switches: --------------------------------------------------------------------- 
     -b<size>  specify maximum block size in bytes, kilobytes (k) or megabytes (m) 
               e.g. -b16m or -b900k or -b4096, default: -b5m, maximum: -b128m 
     -m<mode>  choose second stage algorithms, default: -m2 
               -m0   RLE + direct entropy coding 
               -m1   RLE + MTF (Move To Front) + entropy coding 
               -m2   RLE + WFC (Weighted Frequency Count) + entropy coding 
               -m3   RLE + DWFC (Double Weighted Frequency Count) + entropy coding 
     -s        process input with second stage algorithms only (don't perform BWT) 
     -d        disable adaptive switching between MTF/WFC and direct coding 
     -c<ctxt>  choose contexts for BWT sorting, default: auto-selection 
               -cf   use following contexts 
               -cp   use preceding contexts 
    -------------------------------------------------- contact: uwe.herklotz@gmx.de 
     
    G:Piotrek>uhbc e enwik8.bwt enwik8.m0  -s -m0 
     
    UHBC 1.0  ----   high performance blocksorting compressor   ----   BETA version 
    Copyright (c) 2002-2003 by Uwe Herklotz     All rights reserved     Jun 30 2003 
    >>>> **** FOR TESTING AND EVALUATION ONLY **** NOT FOR COMMERCIAL USE **** <<<< 
     
    Input file  : "enwik8.bwt" 
    Output file : "enwik8.m0" 
    Block size  : 5.00 MB, used memory: 40.25 MB 
    Algorithms  : RLE + direct entropy coding 
     
    Encoding done. 100000004 --> 21834940 bytes, ratio: 1.747 bpc, 22.088 sec 
     
    G:Piotrek>uhbc e enwik8.bwt enwik8.m1  -s -m1 
     
    UHBC 1.0  ----   high performance blocksorting compressor   ----   BETA version 
    Copyright (c) 2002-2003 by Uwe Herklotz     All rights reserved     Jun 30 2003 
    >>>> **** FOR TESTING AND EVALUATION ONLY **** NOT FOR COMMERCIAL USE **** <<<< 
     
    Input file  : "enwik8.bwt" 
    Output file : "enwik8.m1" 
    Block size  : 5.00 MB, used memory: 40.25 MB 
    Algorithms  : RLE + MTF + entropy coding 
     
    Encoding done. 100000004 --> 21662591 bytes, ratio: 1.733 bpc, 30.000 sec 
     
    G:Piotrek>uhbc e enwik8.bwt enwik8.m2  -s -m2 
     
    UHBC 1.0  ----   high performance blocksorting compressor   ----   BETA version 
    Copyright (c) 2002-2003 by Uwe Herklotz     All rights reserved     Jun 30 2003 
    >>>> **** FOR TESTING AND EVALUATION ONLY **** NOT FOR COMMERCIAL USE **** <<<< 
     
    Input file  : "enwik8.bwt" 
    Output file : "enwik8.m2" 
    Block size  : 5.00 MB, used memory: 40.25 MB 
    Algorithms  : RLE + WFC + entropy coding 
     
    Encoding done. 100000004 --> 21046891 bytes, ratio: 1.684 bpc, 42.637 sec 
     
    G:Piotrek>uhbc e enwik8.bwt enwik8.m2d -s -m2 -d 
     
    UHBC 1.0  ----   high performance blocksorting compressor   ----   BETA version 
    Copyright (c) 2002-2003 by Uwe Herklotz     All rights reserved     Jun 30 2003 
    >>>> **** FOR TESTING AND EVALUATION ONLY **** NOT FOR COMMERCIAL USE **** <<<< 
     
    Input file  : "enwik8.bwt" 
    Output file : "enwik8.m2d" 
    Block size  : 5.00 MB, used memory: 40.25 MB 
    Algorithms  : RLE + WFC + entropy coding (no switching) 
     
    Encoding done. 100000004 --> 21055567 bytes, ratio: 1.684 bpc, 34.945 sec 
     
    G:Piotrek>uhbc e enwik8.bwt enwik8.m3  -s -m3 
     
    UHBC 1.0  ----   high performance blocksorting compressor   ----   BETA version 
    Copyright (c) 2002-2003 by Uwe Herklotz     All rights reserved     Jun 30 2003 
    >>>> **** FOR TESTING AND EVALUATION ONLY **** NOT FOR COMMERCIAL USE **** <<<< 
     
    Input file  : "enwik8.bwt" 
    Output file : "enwik8.m3" 
    Block size  : 5.00 MB, used memory: 40.25 MB 
    Algorithms  : RLE + DWFC + entropy coding 
     
    Encoding done. 100000004 --> 20933540 bytes, ratio: 1.675 bpc, 53.516 sec 
     
    G:Piotrek>uhbc e enwik8.bwt enwik8.m3d -s -m3 -d 
     
    UHBC 1.0  ----   high performance blocksorting compressor   ----   BETA version 
    Copyright (c) 2002-2003 by Uwe Herklotz     All rights reserved     Jun 30 2003 
    >>>> **** FOR TESTING AND EVALUATION ONLY **** NOT FOR COMMERCIAL USE **** <<<< 
     
    Input file  : "enwik8.bwt" 
    Output file : "enwik8.m3d" 
    Block size  : 5.00 MB, used memory: 40.25 MB 
    Algorithms  : RLE + DWFC + entropy coding (no switching) 
     
    Encoding done. 100000004 --> 20935286 bytes, ratio: 1.675 bpc, 36.538 sec 
     
    G:Piotrek>


    fpaq0f does better without rle and mtf than uhbc arithmetic coder with rle and mtf

  29. #29
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,340
    Thanks
    211
    Thanked 1,009 Times in 534 Posts
    > ppmy has laughable performance here - i don't know if it's normal.

    More or less, though it spends most of time printing the statistics and
    piping bytes through getc/putc
    You could use ash at least - it has the same options.
    http://compression.ru/sh/ash04.rar
    http://compression.ru/sh/ash04a.rar

    > o1plain can't run on this system - it prints that it runs only on pentium 4
    > processors.

    Here's SSE1-compatible version
    http://shelwien.googlepages.com/o1plain_p3.exe
    Also it has fpaq-like interface now (o1plain_p3 c/d input output)

    > can anybody provide me good order- 1 coders and mtf and rle preprocessors for
    > further experiments?

    Well, just for fun I found some more ancient utils...
    http://shelwien.googlepages.com/dc-kit2c.rar
    http://www.compression.ru/sh/dc-kit1.rar

  30. #30
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    o1plain_p3 compresses:

    enwik8 to 56 631 067 bytes
    enwik8.bwt to 27 859 807 bytes

    poor compression again.

    those utils don't work on my computer. will try more later.

Page 1 of 2 12 LastLast

Similar Threads

  1. flzp_ac2 (flzp + an order-2 arithmetic coder)
    By inikep in forum Data Compression
    Replies: 4
    Last Post: 25th June 2008, 22:37
  2. M01 - Order 01 context mixer
    By toffer in forum Data Compression
    Replies: 58
    Last Post: 17th June 2008, 19:29
  3. fcm1 - open source order-1 cm encoder
    By encode in forum Data Compression
    Replies: 34
    Last Post: 5th June 2008, 00:16
  4. need advice on low- order backend of tarsalzp
    By Piotr Tarsa in forum Forum Archive
    Replies: 4
    Last Post: 25th March 2008, 13:03
  5. CM model
    By toffer in forum Forum Archive
    Replies: 40
    Last Post: 26th January 2008, 07:59

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •