Results 1 to 22 of 22

Thread: Symbol ranking compression

  1. #1
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    I have been experimenting with symbol ranking compression and wrote a new file compressor, sr2 (and decompressor unsr2). Code is GPL.
    http://cs.fit.edu/~mmahoney/compression/text.html# 2739

    The source explains the details, but it is loosely based on Peter Fenwick's SRANK, but with modern improvements like arithmetic coding. It is about as fast as zip -9 but compresses a little smaller. However decompression is a bit slower than compression. I optimized it for speed rather than size. It is ranked 56th on the large text benchmark (zip is 66th) but compresses enwik9 in 99 seconds (zip -9 takes 123 seconds). LZ77 beats it for decompression speed, though.

    The basic idea is to map a 20 bit hash of an order 4 context to a MTF queue of the last 3 bytes in that context plus a consecutive hit count. The queue position or literal (for a miss) is arithmetic coded using an order 1 model (order 0 for counts > 3) with the count as extra context. After a byte is coded it is moved to the front of the queue. The count (max 63) is for consecutive hits at the front of the queue. It is reset to 1 for other positions or 0 for misses.

    SRANK, written in 1996-97, is about twice as fast but gets poor compression. It used Huffman coding with no context modeling. I called my program SR2 because it is only the second symbol ranking compressor that I know of (other than MTF modeling in BWT).

  2. #2
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Interesting work!

    By the way, Matt, for a long time I'm looking for your arithmetic encoder implementation. In my latest programs like LZPM I use a quite similar thing.

    However, one thing for sure you can change in your coder/decoder:

    while (((x1^x2)&0xff000000)==0) { // pass equal leading bytes of range

    may be changed to:

    while ((x1^x2)<(1<<24)) { // pass equal leading bytes of range

    In addition, you can flush all bytes in encoder, to avoid EOF checking in decoder and allow mixed data in the file like header + compressed data + header + compressed data ... In other words, in this case any data after compressed stream can be written safely.

    void Encoder::flush() {
    for (int i=0; i<4; ++i) {
    putc(x2>>24, archive);
    x2<<=8;
    }
    }

  3. #3
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    I'll have to try that in my next version to see if it's faster. But I did an experiment with gcc -S and it looks like both forms of the while loop should compile to the same number of instructions. If I write

    int f(unsigned int x1, unsigned int x2, int a, int b) {
    return ((x1^x2)&0xff000000)==0 ? a : b;
    }

    and compile with gcc -O2 -S -fomit-frame-pointer -march=pentiumpro
    I get

    movl 8(%esp), %eax
    movl 4(%esp), %edx
    xorl %edx, %eax
    testl &#036;-16777216, %eax
    movl 16(%esp), %eax
    cmove 12(%esp), %eax
    ret

    If I replace the return statement with

    return (x1^x2)<(1<<24) ? a : b;

    I get this code:

    movl 8(%esp), %eax
    movl 4(%esp), %edx
    xorl %edx, %eax
    cmpl &#036;16777215, %eax
    movl 16(%esp), %eax
    cmovbe 12(%esp), %eax
    ret

    The -march=pentiumpro option lets the compiler use the conditional move instructions instead of conditional jumps. That is faster because there are no branch mispredictions. The -fomit-frame-pointer option eliminates the extra overhead of setting up a frame pointer, which you should never need anyway.

    Maybe it makes a difference with other compilers. I have Borland and Mars but don't use them because they produce slower code than g++.

  4. #4
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    What I've got with VC 2005 SP1:

    ?f@@YAHIIHH@Z PROC ; f, COMDAT
    ; _x1&#036; = eax

    ; 3 : return (x1^x2)<(1<<24) ? a : b;

    00000 33 44 24 04 xor eax, DWORD PTR _x2&#036;[esp-4]
    00004 3d 00 00 00 01 cmp eax, 16777216 ; 01000000H
    00009 1b c0 sbb eax, eax
    0000b 83 c0 02 add eax, 2

    ; 4 : }

    0000e c3 ret 0
    ?f@@YAHIIHH@Z ENDP



    ?f@@YAHIIHH@Z PROC ; f, COMDAT
    ; _x1&#036; = eax

    ; 3 : return ((x1^x2)&0xff000000)==0 ? a : b;

    00000 33 44 24 04 xor eax, DWORD PTR _x2&#036;[esp-4]
    00004 25 00 00 00 ff and eax, -16777216 ; ff000000H
    00009 f7 d8 neg eax
    0000b 1b c0 sbb eax, eax
    0000d f7 d8 neg eax
    0000f 83 c0 01 add eax, 1

    ; 4 : }

    00012 c3 ret 0
    ?f@@YAHIIHH@Z ENDP

  5. #5
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    That's strange. Are you sure you didn't change to code to something like?

    int f(unsigned int x1, unsigned int x2) {
    return (x1^x2)<(1<<24) ? 1 : 2;
    }

    Here is what I get with
    gcc -O -S -march=pentiumpro -fomit-frame-pointer

    movl 8(%esp), %eax
    xorl 4(%esp), %eax
    cmpl &#036;16777216, %eax
    sbbl %eax, %eax
    addl &#036;2, %eax
    ret

    which is the same as you got with VC. (But note that if I use -O2 then the code has 1 more instruction and uses edx

    movl 8(%esp), %eax
    movl 4(%esp), %edx
    xorl %edx, %eax
    cmpl &#036;16777216, %eax
    sbbl %eax, %eax
    addl &#036;2, %eax
    ret

    but if I use -O2 -march=pentium4 then I get the shorter form like with -O).

    For this code:

    int f(unsigned int x1, unsigned int x2) {
    return ((x1^x2)&0xff000000)==0 ? 1 : 2;
    }

    I get this with -O -march=pentiumpro or -O2 -march=pentium4

    movl 8(%esp), %eax
    xorl 4(%esp), %eax
    andl &#036;-16777216, %eax
    cmpl &#036;1, %eax
    sbbl %eax, %eax
    addl &#036;2, %eax
    ret

    which is one instruction shorter than VC. g++ replaced 2 neg instructions with one cmp immediate. Looking at the opcodes with ndisasm it looks like the code is just 1 byte smaller because the cmp is 3 bytes and neg is 2 bytes.

    ndisasm -u -e0x8c x.o
    00000000 8B442408 mov eax,[esp+0x8]
    00000004 33442404 xor eax,[esp+0x4]
    00000008 25000000FF and eax,0xff000000
    0000000D 83F801 cmp eax,byte +0x1
    00000010 19C0 sbb eax,eax
    00000012 83C002 add eax,byte +0x2
    00000015 C3 ret

  6. #6
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by Matt Mahoney
    Thats strange. Are you sure you didnt change to code to something like?
    Looks like my compiler replaced variables...

    I just briefly copied/pasted your function and tried to use it with some abuse loop - to override dead store elimination.

    I noticed that Subbotins carryless range encoder uses:

    #define TOP (1<<24)

    ...

    while ((low ^ low+range)<TOP || range<BOT && ((range= -low & BOT-1),1))
    OutByte(low>>24), range<<=8, low<<=8;


    and PPMd var.J uses:


    #define RC_ENC_NORMALIZE(stream) {
    while ((low ^ (low+range)) < TOP || range < BOT &&
    ((range= -low & (BOT-1)),1)) {
    _PPMD_E_PUTC(low >> 24,stream);
    range <<= 8; low <<= 8;
    }
    }


  7. #7
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Beginning of the while loop in actual arithmetic decoder (VC 2005 SP1):

    &#036;LN13@Decode:

    ; 87 :
    ; 88 : while (((x1 ^ x2) & 0xff000000) == 0) {

    mov edx, DWORD PTR [esi]
    xor edx, DWORD PTR [esi+4]
    test edx, -16777216 ; ff000000H
    jne SHORT &#036;LN15@Decode
    npad 7
    &#036;LL2@Decode:

    ...

    &#036;LN13@Decode:

    ; 87 :
    ; 88 : while ((x1 ^ x2) < (1 << 24)) {

    mov edx, DWORD PTR [esi]
    xor edx, DWORD PTR [esi+4]
    cmp edx, 16777216 ; 01000000H
    jae SHORT &#036;LN15@Decode
    npad 7
    &#036;LL2@Decode:

    and the Intel C++:

    while (((x1 ^ x2) & 0xff000000) == 0):

    &#036;B2&#036;14: ; Preds &#036;B2&#036;12 &#036;B2&#036;13
    mov WORD PTR ?ppm&#036;3@@4VTPPM@@A+2, ax ;323.18
    mov esi, DWORD PTR ?ppm&#036;3@@4VTPPM@@A+270344 ;323.14
    mov ecx, DWORD PTR ?ppm&#036;3@@4VTPPM@@A+270348 ;323.14
    mov edx, esi ;323.18
    xor edx, ecx ;323.18
    test edx, -16777216 ;323.18
    jne &#036;B2&#036;19 ; Prob 10%

    ...

    while ((x1 ^ x2) < (1 << 24)):

    &#036;B2&#036;14: ; Preds &#036;B2&#036;12 &#036;B2&#036;13
    mov WORD PTR ?ppm&#036;3@@4VTPPM@@A+2, ax ;322.18
    mov esi, DWORD PTR ?ppm&#036;3@@4VTPPM@@A+270344 ;322.14
    mov ecx, DWORD PTR ?ppm&#036;3@@4VTPPM@@A+270348 ;322.14
    mov edx, esi ;322.18
    xor edx, ecx ;322.18
    cmp edx, 16777216 ;322.18
    jae &#036;B2&#036;19 ; Prob 10%


  8. #8
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Thanks Matt!

  9. #9
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    Very good Compressor! Fantastic job Matt !!

  10. #10
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    SR3 file compressor (C) 2007, Matt Mahoney
    Licensed under GPL, http://www.gnu.org/copyleft/gpl.html
    Modified by Nania Francesco Antonio (Italy)
    To compress: sr3 c input output
    To decompress: sr3 d input output
    - Added Wave Compression.
    - Implemented Bmp,txt, doc compression

  11. #11
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Where can we download this version?

  12. #12
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    I have sent the source code to Matt last night, I think that the program is making a will !

  13. #13
    Programmer
    Join Date
    Feb 2007
    Location
    Germany
    Posts
    420
    Thanks
    28
    Thanked 153 Times in 18 Posts
    It will be interesting to see how it copes with multimedia files in comparison to the old version. I think it's a good idea to have a damn fast compressor with some data filtering.

    edit: I almost forgot. Great job Matt and Nania!!

  14. #14
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts

  15. #15
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Thanks Francesco!

  16. #16
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    Thanks LovePimple! Hi !

  17. #17
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,503
    Thanks
    741
    Thanked 665 Times in 359 Posts
    btw, using --param inline-unit-growth=999999 may solve problem with slower speed of compbined executable. it's used in tornado, which executable is 500 kb long, made from 64 template-instantiated 7kb code blocks

  18. #18
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    sr3 mirrored here: http://cs.fit.edu/~mmahoney/compression/#sr2

    28,926,691 enwik8.sr3 14 + 16 sec process times
    253,031,980 enwik9.sr3 138 + 154 sec, 68 MB memory

    The .exe is a bit faster and smaller. I compiled as in the comments and packed with upack 0.399 instead of upx.

    I'm going to wait 30 days after the last update before it becomes the "official" release. (Same rule as large text benchmark).

  19. #19
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    Thanks for all Matt !

  20. #20
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    Updated zip file in my site winturtle!

  21. #21
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    I looked at the sr3 source code. Main changes are the context table size is increased from 4 MB to 64 MB, context length depends on file type (shorter for .bmp) and .wav files are split into separate 1-byte streams. This mostly helps on maximumcompression.com but compression is worse on the Calgary corpus.

    When testing sr2 I found that using order 3 context worked better for smaller files than order 4, but order 4 on larger files. sr2 uses order 4. Increasing the table size has the effect of increasing the context to order 5, which is probably why the Calgary corpus compresses worse. The files are small.

    841,840 a10.jpg.sr2
    1,903,791 acrord32.exe.sr2
    649,876 english.dic.sr2
    3,850,838 FlashMX.pdf.sr2
    699,171 fp.log.sr2
    2,274,079 mso97.dll.sr2
    953,569 ohs.doc.sr2
    1,006,034 rafale.bmp.sr2
    882,891 vcfiu.hlp.sr2
    796,777 world95.txt.sr2
    13,858,866 bytes

    839,496 a10.jpg.sr3
    1,897,298 acrord32.exe.sr3
    712,601 english.dic.sr3
    3,834,589 FlashMX.pdf.sr3
    655,769 fp.log.sr3
    2,264,544 mso97.dll.sr3
    928,010 ohs.doc.sr3
    975,767 rafale.bmp.sr3
    870,298 vcfiu.hlp.sr3
    727,980 world95.txt.sr3
    13,706,352 bytes

    34,017 BIB.sr2
    276,364 BOOK1.sr2
    189,651 BOOK2.sr2
    61,876 GEO.sr2
    139,184 NEWS.sr2
    11,061 OBJ1.sr2
    88,325 OBJ2.sr2
    19,943 PAPER1.sr2
    29,744 PAPER2.sr2
    54,603 PIC.sr2
    14,957 PROGC.sr2
    19,022 PROGL.sr2
    13,831 PROGP.sr2
    22,630 TRANS.sr2
    975,208 bytes

    34,134 BIB.sr3
    284,091 BOOK1.sr3
    195,888 BOOK2.sr3
    60,342 GEO.sr3
    143,538 NEWS.sr3
    11,198 OBJ1.sr3
    89,810 OBJ2.sr3
    21,003 PAPER1.sr3
    31,277 PAPER2.sr3
    54,779 PIC.sr3
    15,757 PROGC.sr3
    19,774 PROGL.sr3
    14,479 PROGP.sr3
    22,720 TRANS.sr3
    998,790 bytes

  22. #22
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    The analysis that have done is perfect Matt! would automatically need to do so that the program him proper to the smallest contexts!

Similar Threads

  1. SSE, TSE & higher symbol probability estimations
    By Dmitry Shkarin in forum Forum Archive
    Replies: 4
    Last Post: 6th March 2008, 19:47

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •