Page 1 of 2 12 LastLast
Results 1 to 30 of 40

Thread: Large text benchmark

  1. #1
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    Over the last week I updated the large text benchmark with some candidate programs (not in main table yet): symbra, lpaq8, lpaq8e, lcssr, fpaqa, hook 1.3, lzpm 0.13, cmm1, cmm2, fpaqb, fpaq0m, bit 0.1. cmm1 compresses enwik9 better than cmm2 and is over 30 days old so it is ranked in the main table.
    http://cs.fit.edu/~mmahoney/compression/text.html

    fpaqa and fpaqb are my experimental implementations of Jarek Duda's asymmetric binary coder, which he proposed as an alternative to arithmetic coding. However, my implementation is slower, or else I would use it in other programs. As an experiment I tried the coder from fpaqb (my newer version) in lpaq1. It worked with only minor changes, but it was 1% slower and compressed 0.01% larger, so I saw no advantage to using it. An asymmetric coder requires dividing the input into blocks for compression, saving the bit predictions and coding them in reverse order. The extra space is for a 3 byte block header for each 64KB input block, which needs 2.5 MB memory to save the predictions and then reverse the compressed block. Decompression doesn't need to reverse anything so it doesn't use any extra memory.

    fpaqa uses lookup tables and no multiplication or division. In theory it would be faster than arithmetic coding on old hardware, but on newer machines multiplication is faster than a table lookup. fpaqb uses only one small table of inverses to replace division with a multiplication during compression and no table during decompression. This is why decompression is faster than compression (but still slower than arithmetic coding). If I could figure out how to make it faster, I would use it in all my programs (paq8, bbb, sr2, lpaq, etc). fpaqb compresses smaller, uses less memory, and is simpler and faster than fpaqa, but not fast enough.

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,426
    Thanks
    223
    Thanked 1,053 Times in 565 Posts
    Btw, asymmetic implementation of arithmetic coding has some benefits too...
    I mean, storing a block of bits/probabilities during encoding and generating
    a block of code in a separate loop.
    1. With some redundancy you can sort bits by quantized probability and
    store groups with the same probability into separate blocks - with reduced
    number of operations.
    2. Its also possible to compare number of bits to approximated codelength
    and store the given block without compression if it was ineffective.

    Also, considering ABS, i think it might be possible to implement it
    in a rangecoder way, by replacing
    x = (w*Q)/q
    with
    xlow = w*(Q/q)
    x = (w*(Q%q))/q
    but like that it'll lose whatever small benefits it has...

    Its much fun that such thing as ABS is still possible though, thanks
    for implementing it

  3. #3
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    Quote Originally Posted by Shelwien
    1. With some redundancy you can sort bits by quantized probability and store groups with the same probability into separate blocks - with reduced number of operations.
    Yes, but how do you unsort the bits to decompress?

    Quote Originally Posted by Shelwien
    2. Its also possible to compare number of bits to approximated codelength
    and store the given block without compression if it was ineffective.
    Good idea. Right now I use a second buffer to reverse the compressed data. I make this twice as big as the input block just in case it is expanded, but there is still a theoretical possibility of overflow.

    Also, I posted lpaq1a = lpaq1 model + fpaqb coder. Performance is very slightly worse: 1% slower, 0.02% larger, 1.25 MB extra memory for compression. So I need good ideas like yours to improve it
    http://cs.fit.edu/~mmahoney/compression/#lpaq

  4. #4
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,426
    Thanks
    223
    Thanked 1,053 Times in 565 Posts
    > > 1. With some redundancy you can sort bits by quantized probability and
    > > store groups with the same probability into separate blocks - with reduced
    > > number of operations.
    >
    > Yes, but how do you unsort the bits to decompress?

    Well, that's why redundancy. You'd need to store the subblocks in some order,
    so subblock length encoding would be needed. Also knowing the length of
    arithmetic code allows some tail cutting, so it might even be more
    effective this way.

    Another thing that I used was a carry-in-block flag for faster carryless
    decoding - you can use faster decoding routine if you know that there'd be
    no carry to fix.

    > > 2. Its also possible to compare number of bits to approximated codelength
    > > and store the given block without compression if it was ineffective.
    >
    > Good idea. Right now I use a second buffer to reverse the compressed data.
    > I make this twice as big as the input block just in case it is expanded,
    > but there is still a theoretical possibility of overflow.

    Well, my implementation compared the weighted codelength vs bitcount and
    turned probability into 0.5 if redundancy was detected. Like that you can
    assure that there's some upper limit on codelength without actually
    splitting the file into blocks. Guess you'd consider it slow though .

  5. #5
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,475
    Thanks
    26
    Thanked 121 Times in 95 Posts
    ive read fpaqb decoding routines.

    1.
    int d=(xq&((1<<N)-1))>=(1<<N)-q;
    can be changed to
    int d = ((xq & (1 << N) - 1) + q) >> N;
    if i understand the coder correctly.

    2. you can mix update of predictor and update in single if. ie:
    if (d)
    t[cxt]+=65536-t[cxt]>>5, x=(xq>>N)+1;
    else
    t[cxt]-=t[cxt]>>5, x-=(xq>>N)+1;
    but it requires interface changes.

    3.
    q+=q<2048;
    is that really needed? i would remove that.

    4. could be while loop in renormalization changed to if statement? if i recall correctly minimum probability is 1/ 256 so at most one byte is taken.

  6. #6
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,475
    Thanks
    26
    Thanked 121 Times in 95 Posts
    and encoder

    1. you can change:
    1-(q&1)
    to
    ~q&1
    2. why are you checking for:
    p>outs+10
    statement in while loop? i suppose its reduntant. if it does something then you can move that check inside renormalization part, ie. just after line:
    *--p=w;
    because p is changed only in that line.

    3. again, i dont understand for what that line is:
    q+=q<2048;

  7. #7
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,475
    Thanks
    26
    Thanked 121 Times in 95 Posts
    sorry for triple post, but im lazy ;p

    why you store block size in the archive? you are using eof bit after every symbol to mark end of file, so any filesize indicators are reduntant.

    in decoders constructor initialize n to 1 and change decode routine from:
    while (n <= 0)
    {
    [read x from stream]
    [read n from stream]
    }
    [decoding]
    --n;
    to:
    if (!(--n))
    {
    [read x from stream]
    n = B;
    }
    [decoding]
    huh, youre using while loops in places where simple if statement would be sufficient.

  8. #8
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,475
    Thanks
    26
    Thanked 121 Times in 95 Posts
    also we should note that hand written assembly decoder (with tricks like carry flags and mixed 16/ 32- bit operations) should work faster with 16 bit precision than with 12 bit precision.


    Merry Christmas!

  9. #9
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    Thanks for the suggestions. I implemented most of the optimizations in fpaqc and it is faster. http://cs.fit.edu/~mmahoney/compression/#fpaq0

    The statement q += q<2048 is necessary because if q is 0 then you are doomed It doesn't happen with this model but in general it could (e.g. lpaq1a). Also, you do need a while-loop in the compressor because if a bit is predicted with probability less than 1/256 it could output 2 bytes. I didn't move the x update to the predictor because I wanted to keep the same interface.

  10. #10
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,475
    Thanks
    26
    Thanked 121 Times in 95 Posts
    hmm, i thought it should have made bigger difference. but its still better than nothing

    please change my nickname to my real name, Piotr Tarsa in documentation

    there are still more ways to speed up the abs, for example you can force block size to be divisible by 8 (or 9 if you store additional escape bit for each byte) and check for flushing or reading block on byte boundary (not at every bit).




    and one thing, regarding only fpaq and other escape based coders - you can make special escape code handling, ie: set fixed probability for escape code - probability of 1 should be 1 (lowest possible probability) and code it indepedently. this way you can:

    - save model space: currently model uses only half of the entires because of escape bit included in context. without it there are no reduntant entreies in model (except zero entry which is not used).
    - save coder space: if probability and bit (itll be always zero before each byte) are known then they arent needed to be stored on stack.
    - make simplified and fast separate routines for coding escapes.


    to summarize (assuming we have fixed probability of escape codes):

    1. whole decodeescape() function:
    decodeescape()
    {
    int d=!(x&((1<<N)-1));
    x-=(x-1>>N)+1;
    while (x<1<<N)
    x=(x<<+getc(archive);
    return d;
    }
    note that value of x will be invalid after call of this function if we decoded bit 1, but it will happen only at last bit (escape bit to be exact), so this irrelevant for us (as we dont have anything more to decode).

    2. we skip encoding escape bit before every byte in main function. instead we will modify flush function. before line:
    int q=ins[--n];
    we will place code:
    if(!(n&7))
    {
    while (1) {
    x=(1+w)*qinv[2]-1>>32;
    if (x<256<<N) break;
    *--p=w;
    w>>=8;
    }
    w=x;
    }
    3. we must encode final escape bit with another function, because of modified flush function. it woill look like that:
    while (1) {
    x=(w)*qinv[3]-1>>32;
    if (x<256<<N) break;
    *--p=w;
    w>>=8;
    }
    w=x;

  11. #11
    Member
    Join Date
    Dec 2006
    Posts
    611
    Thanks
    0
    Thanked 1 Time in 1 Post
    In case anybody was interested, I ran fpaq, fpaq0p, fpaqb and fpaqc through a very quick test (on my testset):

    app name, comp time (s), decomp time, result size (approx.):
    fpaq: 8.3, 8.1, 24.2 MB
    fpaq0p: 4.3, 4.0, 21.9 MB
    fpaqb: 9.1, 5.6, 21.8 MB
    fpaqc: 8.5, 5.9, 21.8 MB

    (which one of fpaq's is actually the direct competitor to ABS ones?)

  12. #12
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,475
    Thanks
    26
    Thanked 121 Times in 95 Posts
    fpaqa, fpaqb and fpaqc are abs ones.

    oops, it seems that my first decoding optimization actually hurts.

    int d = ((xq & (1 << N) - 1) + q) >> N;
    is not good.

    you can try:
    int d = ((xq & (1 << N) - 1) + q) >= 1 << N;
    or
    int d = ((xq + q) ^ xq) >= 1 << N;
    and also test if:
    xq=(xq>>N)+1;
    if (d) x=xq;
    else x-=xq;
    is slower than:
    if (d) x=(xq>>N)+1;
    else x+=~(xq>>N);
    or even make:
    if (((xq + q) ^ xq) >= 1 << N)
    {
    predictor.update(1);
    x = (xq >> N) + 1;
    while (x<1<<N)
    x=(x<<+getc(archive);
    return 1;
    }
    else
    {
    predictor.update(0);
    x += ~(xq >> N);
    while (x<1<<N)
    x=(x<<+getc(archive);
    return 0;
    }
    make predictor class functions inline.

  13. #13
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    By the way now we can change from:
    Code:
      void Update(int y) { 
        if (y) { 
          p1 += ((65536 - p1) >> 3); 
          p2 += ((65536 - p2) >> 6); 
        } 
        else { 
          p1 -= (p1 >> 3); 
          p2 -= (p2 >> 6); 
        } 
      }

    to
    Code:
      void Update(int y) { 
        if (y) { 
          p1 += ((unsigned short)~p1 >> 3); 
          p2 += ((unsigned short)~p2 >> 6); 
        } 
        else { 
          p1 -= (p1 >> 3); 
          p2 -= (p2 >> 6); 
        } 
      }

  14. #14
    Member
    Join Date
    Dec 2006
    Posts
    611
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by donkey7
    fpaqa, fpaqb and fpaqc are abs ones.
    I was rather asking about which of all other fpaqs is direct concurrenction to these three ASM fpaq is too slow and carryless fpaq0p is too fast anyway, so which version were those actually compared with?

  15. #15
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,475
    Thanks
    26
    Thanked 121 Times in 95 Posts
    fpaqa, fpaqb and fpaqc use model from fpaq0p (btw: invented by encode), so they are competitors to fpaq0p.

    fpaq0p is very simple so it's fast.

    fpaqa, fpaqb and fpaqc are more complicated because of their assymetric nature, and it needs to be optimized.

  16. #16
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,475
    Thanks
    26
    Thanked 121 Times in 95 Posts
    fpaqd.zip
    removed escape codes - now its more like in fpaqb than fpaqc, ie. size of last block is stored.
    this version assumes that block size is fixed and contains integral (discrete?) number of symbols, but it can be changed without hurting speed. you just must allow flexible block sizes and store size of each block in compressed file. this will allow to store discrete number of variable length symbols (eg. lz codes).

    instead of:
    q+=q<2048;
    im using:
    q=q|1;
    this hurts compression by 0.001 % or so.

    compiled with g++ with o3 and minor optimizations (according to dev-cpp)

  17. #17
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    Quick test with enwik8 - 2.2 GHz Athlon-64 3500+, WinXP (32 bit). Times are process times.

    fpaqc 61270455 26.3 18.8 sec
    fpaqd 63391013 25.4 24.2 sec
    fpaqd 63391013 25.7 24.7 sec (recompiled with -o2 and other options same as fpaqc)

    BTW fpaqa, fpaqb, fpaqc can all be compared with arithmetic coding by compiling with -DARITH

    fpaqc -DARITH 61280454 10.7 11.9

  18. #18
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    Quote Originally Posted by donkey7
    please change my nickname to my real name, Piotr Tarsa in documentation
    No problem

  19. #19
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,475
    Thanks
    26
    Thanked 121 Times in 95 Posts
    hmm, on my system (athlon xp 2200) with enwik8 results are different (compressing and decompressing to nul, times reported by program, no other activities):

    Code:
     
      ctime     dtime       size           name 
     
      35.89 s   23.83 s     61 270 455 b   fpaqc 
      27.28 s   15.58 s     61 272 808 b   fpaqd 
      16.94 s   14.73 s     61 457 810 b   fpaq0p

  20. #20
    Member
    Join Date
    Dec 2006
    Posts
    611
    Thanks
    0
    Thanked 1 Time in 1 Post
    Athlon 64 X2 3800+ (2 GHz), XP SP1 32-bit, 12 testfiles, wall times, used shipped binaries

    Code:
            comp decomp (seconds) 
     fpaqc:  8.5,  5.9 
     fpaqd:  7.1,  4.2 
    fpaq0p:  4.3,  4.0

  21. #21
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,475
    Thanks
    26
    Thanked 121 Times in 95 Posts
    reuploaded another version to the same place.

    (unrolled some loops as in sr2, moved flushing to 'main', moved context to 'main')

  22. #22
    Member
    Join Date
    Dec 2006
    Posts
    611
    Thanks
    0
    Thanked 1 Time in 1 Post
    Code:
             comp decomp (seconds) 
      fpaq0p: 4.3, 4.0 
    fpaqd_v1: 7.1, 4.2 
    fpaqd_v2: 6.2, 3.5
    speedy indeed!!

  23. #23
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    Very good speed-Up Piotr!

  24. #24
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,475
    Thanks
    26
    Thanked 121 Times in 95 Posts
    on my todo list is support for storing incompressible blocks (i think i need to back- up model before each block and restore model when incompressible block is detected - this way decompression would be faster, but compression will be slower) and incorporating fpaqd into sr2.

  25. #25
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    Quote Originally Posted by Matt Mahoney
    Quick test with enwik8 - 2.2 GHz Athlon-64 3500+, WinXP (32 bit). Times are process times.

    fpaqc 61270455 26.3 18.8 sec
    fpaqd 63391013 25.4 24.2 sec
    fpaqd 63391013 25.7 24.7 sec (recompiled with -o2 and other options same as fpaqc)

    BTW fpaqa, fpaqb, fpaqc can all be compared with arithmetic coding by compiling with -DARITH

    fpaqc -DARITH 61280454 10.7 11.9
    New results with fpaqd

    61,272,808 17.4 10.7 (median of 3 runs)

  26. #26
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Thanks Matt, and donkey7!

  27. #27
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,475
    Thanks
    26
    Thanked 121 Times in 95 Posts
    q+=q<2048;
    finally i found an elegant workaround to that

    just write:
    <div class=""jscript""><pre>
    qinv[0] = qinv[2];
    qinv[1] = qinv[3];
    [/code]

    this would be equivalent to:
    <div class=""jscript""><pre>
    q += q < 1;
    [/code]

    but there will be no overhead in main loop.

  28. #28
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    Yes, but you still need it in the decoder.

  29. #29
    Member chornobyl's Avatar
    Join Date
    May 2008
    Location
    ua/kiev
    Posts
    153
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Can you add to large Text Banchmark the column with compressed_size/compression_time ratio
    calculated in this simple way and NOT as the maximumcompression.com do

  30. #30
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,475
    Thanks
    26
    Thanked 121 Times in 95 Posts
    it's easy to guess that fast thor modes will rule (e1, e2, e3) among fast compressors and freearc will rule among efficient compressors for such formula.

Page 1 of 2 12 LastLast

Similar Threads

  1. Text Detection
    By Simon Berger in forum Data Compression
    Replies: 15
    Last Post: 30th May 2009, 10:58
  2. LZSS with a large dictionary
    By encode in forum Data Compression
    Replies: 31
    Last Post: 31st July 2008, 22:15
  3. New rule for large text benchmark
    By Matt Mahoney in forum Forum Archive
    Replies: 5
    Last Post: 28th October 2007, 22:00

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •