Results 1 to 22 of 22

Thread: Byte oriented LZ77 compressor

  1. #1
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts

    Byte oriented LZ77 compressor

    Below is an experimental byte-oriented LZ77 compressor. Compression on large files (enwik8/9) is slightly better than zip, although it takes about twice as long. The reason for writing it is that the output can be further compressed, so it may find its way into zpaq as a preprocessor for fast compression (-m0). Decompression is still 3x faster than writing to disk on my laptop (instead of 6x).

    The format is a little like snappy, but with a 16 MB buffer. Literals are preceded by a count 1..64. Matches are coded as 2, 3, or 4 bytes. A 2 byte match has a length of 4..11 and offset up to 2K. A 3 or 4 byte match code has a length 1..64 and offset up to 64K or 16M. (I use a minimum match length of 4 and 5 respectively).

    There are a lot of size/speed tradeoffs for compression. I decided on storing 3 context hashes (orders 10, 7, 4) per buffer position in a 8M hash table with 8, 4, 8 pointers per bucket. It uses greedy parsing, taking the longest match and breaking ties by the nearest offset. If a match is found, then the lower order buckets are skipped. To reduce cache misses, buckets don't cross cache lines, and the first byte is packed next to the pointer (8+24 bits) and compared first. Compression requires 64 MB memory.

    Code:
    /* lzpre.cpp v1.0 - LZ77 preprocessor for ZPAQ
    
    (C) 2011 Dell Inc. Written by Matt Mahoney
    Licensed under GPL v3, http://www.gnu.org/copyleft/gpl.html
    
    Usage: lzpre c|d input output
    c = compress, d = decompress
    
    Compressed format is byte oriented LZ77. Lengths and offsets are MSB first:
    
      00xxxxxx                            x+1 (1..64) literals follow
      01xxxyyy yyyyyyyy                   copy x+4 (4..11), offset y+1 (1..2048)
      10xxxxxx yyyyyyyy yyyyyyyy          copy x+1 (1..64), offset y+1 (1..65536)
      11xxxxxx yyyyyyyy yyyyyyyy yyyyyyyy copy x+1 (1..64), offset y+1 (1..2^24)
    
    Decompression needs 16 MB memory. The compressor uses 64 MB consisting
    of 2 16 MB buffers and a 8M (32 MB) hash table as an index to find matches.
    For each byte position in the input buffer, the table stores 3 hashes
    of order 10, 7, and 4 in buckets of size 8, 4, 8 respectively. The buckets
    are searched in that order, taking the longest match found (greedy
    matching), breaking ties by taking the smaller offset. If a match
    is found, then the remaining buckets are skipped. To update, the bucket
    entry indexed by the low bits of position is replaced.
    
    The minimum match length is 5 for 4-byte match codes and 4 otherwise.
    Matches and literal strings longer than 64 are coded as a series of
    length 64 codes.
    
    As a speed optimization, each hash table entry contains a 24 bit pointer
    and the byte pointed to, packed into 32 bits. If the byte in the hash
    table mismatches, then this avoids a cache miss to compare the first
    byte of the buffer. The hash table is aligned so that buckets do not
    cross 64 byte cache lines.
    
    Results: timed on a 2.0 GHz T3200, Win32. Compiled: g++ v4.6.1 -O3 -s
    
    323,743,109 enwik9.zip     95.2s, 18.4s (process times)
    318,881,563 enwik9.lzpre  192.7s, 29.2s
    
     36,518,556 enwik8.zip     10.8s,  1.3s
     36,166,469 enwik8.lzpre   17.7s,  2.5s
    
    However, byte oriented LZ77 can be further compressed, for example:
    
     34,435,203 enwik8.lzpre.zip
     33,788,618 enwik8.lzpre.pmd
     33,659,031 enwik8.lzpre.7z
     32,919,371 enwik8.lzpre-m1.zpaq
    
    */
    
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <time.h>
    
    // Write literal sequence buf[i-lit..i-1], set lit=0
    void write_literal(unsigned char* buf, int i, int& lit, FILE* out) {
      while (lit>0) {
        int lit1=lit;
        if (lit1>64) lit1=64;
        putc(lit1-1, out);
        for (int j=i-lit; j<i-lit+lit1; ++j) putc(buf[j], out);
        lit-=lit1;
      }
    }
    
    // Write match sequence of given length and offset (off in 1..2^24)
    void write_match(int len, int off, FILE* out) {
      --off;
      while (len>0) {
        int len1=len;
        if (len1>64) len1=64;
        if (off<2048 && len1>=4 && len1<=11) {
          putc(64+(len1-4)*8+(off>>8), out);
          putc(off, out);
        }
        else if (off<65536) {
          putc(128+len1-1, out);
          putc(off>>8, out);
          putc(off, out);
        }
        else {
          putc(192+len1-1, out);
          putc(off>>16, out);
          putc(off>>8, out);
          putc(off, out);
        }
        len-=len1;
      }
    }
    
    // Args are c|d input output
    int main(int argc, char** argv) {
    
      // Start timer
      clock_t start=clock();
    
      // Check args
      if (argc!=4 || argv[1][0]!='c' && argv[1][0]!='d') {
        fprintf(stderr, "To compress/decompress: lzpre c/d input output\n");
        return 1;
      }
    
      // Open files
      FILE* in=fopen(argv[2], "rb");
      if (!in) return perror(argv[2]), 1;
      FILE* out=fopen(argv[3], "wb");
      if (!out) return perror(argv[3]), 1;
    
      // Tunable LZ77 parameters
      const int HTSIZE=1<<23;   // hashtable size, must be a power of 2
      const int BUFSIZE=1<<24;  // buffer size (each of 2), max 2^24
      const int HASHES=3;       // number of hashes computed per byte
      const int HASHORDER[HASHES]={10,7,4};  // context order per hash
      const int HASHMUL[HASHES]={44,56,96};  // hash multipliers
      const int HASHBUCKET[HASHES]={8,4,8};  // number of searches per hash
    
      // Allocate buf (uncompressed data) and hashtable ht
      // ht[h] low 24 bits points to buf[i..i+HASHORDER-1], high 8 bits is buf[i]
      unsigned char* buf=(unsigned char*)calloc(BUFSIZE, 2);
      int* ht=(int*)calloc(HTSIZE+16, sizeof(int));
      if (!buf || !ht) return fprintf(stderr, "Out of memory\n"), 1;
      ht+=16-((ht-(int*)0)&15);  // align on 64 byte address
      int h[HASHES]={0};  // context hashes of buf[i..]
    
      // Compress
      while (argv[1][0]=='c') {
    
        // Read block into second half of buf
        const int n=fread(buf+BUFSIZE, 1, BUFSIZE, in)+BUFSIZE;
        if (n<=BUFSIZE) break;
    
        // Scan the block just read in. ht may point to previous block
        int lit=0;  // number of output literals pending
        for (int i=BUFSIZE; i<n;) {
    
          // Search for longest match, or pick closest in case of tie
          // Try the longest context orders first. If a match is found, then
          // skip the lower orders as a speed optimization.
          int blen=0, bp=0, len=0;
          for (int j=0; j<HASHES; ++j) {
            for (int k=0; k<HASHBUCKET[j]; ++k) {
              int p=ht[h[j]+k];
              if ((p>>24&255)==buf[i]) {  // compare in ht first
                p=(p&BUFSIZE-1)+BUFSIZE;
                if (p>=i) p-=BUFSIZE;
                if (p>0 && p<i && p+(1<<24)>i) {
                  for (len=0; i+len<n && buf[p+len]==buf[i+len]; ++len);
                  if (len>blen || len==blen && p>bp) blen=len, bp=p;
                }
              }
            }
            if (blen>=HASHORDER[j]) break;
          }
    
          // If match is long enough, then output any pending literals first,
          // and then the match. blen is the length of the match.
          const int off=i-bp;  // offset
          if (blen>3+(off>=65536) && off>0 && off<(1<<24)) {
            write_literal(buf, i, lit, out);
            write_match(blen, off, out);
          }
    
          // Otherwise add to literal length
          else {
            blen=1;
            ++lit;
          }
    
          // Update index, advance blen bytes
          while (blen--) {
            for (int j=0; j<HASHES; ++j)
              ht[h[j]+(i&HASHBUCKET[j]-1)]=(i&BUFSIZE-1)+(buf[i]<<24);
            ++i;
            for (int j=0; j<HASHES; ++j) {
              if (i+HASHORDER[j]<=n) {
                h[j]/=HASHBUCKET[j];
                h[j]*=HASHMUL[j];
                h[j]+=buf[i+HASHORDER[j]-1]+1;
                h[j]*=HASHBUCKET[j];
                h[j]&=HTSIZE-1;
              }
            }
          }
        }
    
        // Write pending literals at end of block
        write_literal(buf, n, lit, out);
    
        // Move data from second half of buf to first half if more input
        // is expected.
        if (n==BUFSIZE*2) memmove(buf, buf+BUFSIZE, BUFSIZE);
      }
    
      // Decode. state is as follows:
      // 0 = expecting a literal or match code.
      // 1 = decoding a literal with len bytes remaining.
      // 2 = expecting last offset byte of a match code.
      // 3,4 = expecting 2,3 match offset bytes.
      if (argv[1][0]=='d') {
        int c, i=0, state=0, len=0, off=0;
        while ((c=getc(in))!=EOF) {
          if (state==0) {
            state=1+(c>>6);
            if (state==1) off=0, len=c+1;  // literal
            else if (state==2) off=c&7, len=(c>>3)-4;  // short match
            else off=0, len=(c&63)+1;  // match
          }
          else if (state==1) { // literal
            putc(buf[i++&BUFSIZE-1]=c, out);
            if (--len==0) state=0;
          }
          else if (state>2) {  // reading offset
            off=off<<8|c;
            --state;
          }
          else { // state==2, match
            off=off<<8|c;
            off=i-off-1;
            while (len--)
              putc(buf[i++&BUFSIZE-1]=buf[off++&BUFSIZE-1], out);
            state=0;
          }
        }
      }
    
      // Print compression statistics
      printf("%ld -> %ld in %1.2f sec\n", ftell(in), ftell(out),
        double(clock()-start)/CLOCKS_PER_SEC);
      return 0;
    }

  2. #2
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts

    Preprocessor or Compressor ?

    Hi Matt ..
    for me is not Preprocessor but very nice LZ77 Compressor!
    Best regards!

  3. #3
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    410
    Thanks
    37
    Thanked 60 Times in 37 Posts
    @matt:
    lzpre.cpp v1.0 - LZ77 preprocessor
    - sounds interesting
    - is there a windows binary for testing available ?

    in a first step we can collect the files and build quickly a archive
    in a second step we can do a stronger compression and make the archive smaller

    would this work ?

  4. #4
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,423
    Thanks
    223
    Thanked 1,052 Times in 565 Posts

  5. #5
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    287
    Thanks
    9
    Thanked 33 Times in 21 Posts
    Quote Originally Posted by joerg View Post
    in a first step we can collect the files and build quickly a archive
    in a second step we can do a stronger compression and make the archive smaller

    would this work ?
    No, because context depency will be mostly destroyed.

  6. #6
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Interesting. I wish I had time to look into it closer. Any benchmarks? Matt, you're often the first to post benchmark results of new codecs, yet you didn't do it with one of your own!

  7. #7
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    So here is the mystery. I can improve compression by using a smarter replacement policy in the hash table index. Currently, I select the bucket element to be replaced using the low bits of the buffer position, which is effectively random. In fact, selecting an element using rand() improves compression. The best policy ought to be replacing the oldest element (lowest pointer value) and in fact this improves compression the most. But look at the compression times for enwik8:

    Code:
          // Update index, advance blen bytes
          while (blen--) {
            for (int j=0; j<HASHES; ++j) {
              int bk=0, ba=0;  // find bk=oldest pointer with age ba
              for (int k=0; k<HASHBUCKET[j]; ++k) {
                int age=i-ht[h[j]+k]&BUFSIZE-1;
                if (age>ba) ba=age, bk=k;
              }                                                             // Size   Ctime Dtime
    //        ht[h[j]+(i&HASHBUCKET[j]-1)]=(i&BUFSIZE-1)+(buf[i]<<24);      // 36166469 17.4 2.4 (original code)
    //        ht[h[j]+(rand()&HASHBUCKET[j]-1)]=(i&BUFSIZE-1)+(buf[i]<<24); // 36151879 28.0 2.4 (replace at random)
              ht[h[j]+bk]=(i&BUFSIZE-1)+(buf[i]<<24);                       // 36054685 57.6 2.4 (replace oldest)
            }
            ++i;
    Is a linear search over 20 elements per byte really that slow? Leaving in the code but ignoring the result doesn't significantly affect compression time. Is the compiler optimizing it out? Cache misses somewhere else? I'm still trying to figure it out.

    I did try a bunch of other replacement policies, like moving matches to the front of the bucket and various methods of replacing only the back elements, or preferentially replacing elements where the current byte doesn't match. But these attempts made both speed and compression worse so I'm not going to bother with them.

  8. #8
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    287
    Thanks
    9
    Thanked 33 Times in 21 Posts
    as you place the hash-values + buckets in the same table strange memory access patterns may appear. Maybe you could try ht[HASHSIZE][BUCKETSIZE]

  9. #9
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    To answer other questions: I used a byte-oriented code so that it could be compressed further. As you can see in my OP, you can compress another 10% with most compressors. I don't expect compression to be as good as the original data, but that's OK because it is intended to be fast. It will probably be a preprocessor for zpaq using a single component for modeling (CM or ICM). But I may replace relative offsets with absolute (replacing low bits of the offset) or increase the minimum match length to improve modeling, or make other changes to the format. This is still experimental, so I'll benchmark it when I finish developing it.

  10. #10
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    746
    Thanked 668 Times in 361 Posts
    >Is a linear search over 20 elements per byte really that slow?

    it's 20x slower than performing single check. this is the inner loop, taking into account simplicity of encoding, you just don't have anything else so time-consuming

    also, i don't looked into your code details, but why don't use rotating pointer into array?

  11. #11
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    Quote Originally Posted by Sebastian View Post
    as you place the hash-values + buckets in the same table strange memory access patterns may appear. Maybe you could try ht[HASHSIZE][BUCKETSIZE]
    Originally I wrote it that way, but I wanted to have variable sized buckets. I'm still effectively doing that and computing the 2-D addresses. I tried making all of the bucket sizes the same to make sure it wasn't causing a problem.

  12. #12
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    The decoder uses a rotating buffer. I know it would save 16 MB memory to do the same with the encoder, but it complicates the coding to have to mask all of the indexes and to look ahead for matches without hitting the back of the buffer. The encoder now can't find matches that cross 16M boundaries, but I don't think the effect is very big. I may update it later but for now I care more about working code. (Besides, the decoder allocates 64 MB but only uses 16 MB of it. I don't care because I'm going to re-implement it in ZPAQL eventually).

  13. #13
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    OK, I think I figured it out. The linear search really is taking that long. Making the hash table tiny saves about half of the time, so it is partially due to cache misses (3 per byte). The code was already writing to these 3 cache lines, but it had no effect on speed because writing to memory can be done in parallel with other instructions. But not so when reading. In order to detect this, I had to add some code so g++ wouldn't optimize it out like this:

    Code:
              for (int k=0; k<HASHBUCKET[j]; ++k) {
                int age=i-ht[h[j]+k]&BUFSIZE-1;
                if (age>ba) ba=age, bk=k;
              }
              foo+=bk;  // 3 times slower with this statement

  14. #14
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Matt, it would be better if you moved benchmark results from the source to the post. Guess how many people will go through the file format specification to find it. Quick and rough test on scc.tar, Pentium D 2.66, time=process time: No code has to be inserted here. Not sure if c1 wouldn't be a more direct competitor, but decompression speed is roughly the same.

  15. #15
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    746
    Thanked 668 Times in 361 Posts
    it should be better to use tornado oitself dor such tests. although probably i just not compiled the latest version yet

  16. #16
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    My attempt to further compress lzpre output.

    Code:
    30,634,055 enwik8.lzpre-mm0.zpaq 14.1 sec, 19.6 sec (2.0 GHz T3200, 1 thread)
    31,083,558 enwik8.lzpre-7.paq8px_v69 (5957 sec)
    31,821,701 enwik8.lzpre-cc.nz
    32,108,426 enwik8.lzpre-m4.zpaq
    32,436,235 enwik8.lzpre-7.lpaq9m
    32,458,343 enwik8.lzpre-b40.bsc
    32,680,875 enwik8.lzpre-m2.zpaq
    32,688,175 enwik8.lzpre.pmm
    32,800,247 enwik8.lzpre-m3.zpaq
    32,919,371 enwik8.lzpre-m1.zpaq
    33,084,027 enwik8.lzpre-7.ccmx
    33,137,874 enwik8.lzpre.nz
    33,659,031 enwik8.lzpre.7z
    34,435,203 enwik8.lzpre.zip
    36,166,469 enwik8.lzpre
    m0.cfg is a model customized for lzpre output. It uses a single CM (the fastest possible ZPAQ model). The context is the parse state, order 1 for literals, and order 0 for match/literal codes taking the entire code as a single symbol. In the case of match codes, offsets are modeled omitting the 2 low bits of the first byte and omitting subsequent offset bytes. In the case of a 2 byte match code, the 2 dropped bits are bits 8-9 of the offset. In the case of a 3 or 4 byte match code, the dropped bits are the 2 low bits of the match length.

    Code:
    (m0.cfg model for lzpre output)
    (enwik8.lzpre 36,166,469 -> 30,634,055 14.1s 19.6s)
    
    comp 0 0 0 0 1
      0 cm 20 48
    hcomp
      (c=state: 0=init, 1=expect LZ77 literal or match code,
       2..4=expect n-1 offset bytes,
       5..68=expect n-4 literals)
      b=a (save input)
      a=c a== 1 if (expect code ccxxxxxx as input)
        (cc is number of offset bytes following)
        (00xxxxxx means x+1 literal bytes follow)
        a=b a>>= 6 a&= 3 a> 0 if
          a++ c=a (high 2 bits is code length)
          *d=0 a=b a>>= 2 hashd
        else
          a=b a&= 63 a+= 5 c=a (literal length)
          *d=0 a=b hashd
        endif
      else
        a== 5 if (end of literal)
          c= 1 *d=0
        else
          a== 0 if (init)
            c= 5 *d=0 (initial state will be 7+length of postprocessor)
          else (literal or offset)
            c--
            (model literals in order 1 context, offset order 0)
            a> 5 if *d=0 a=b hashd endif
          endif
        endif
      endif
    
      (model parse state as context)
      a=c a> 5 if a= 5 endif hashd
      halt
    post
      0 (lzpre decoder will go here as postprocessor)
    end
    I also did some experiments where I modified lzpre to encode offsets as the low bits of absolute addresses rather than differences. I reasoned that repeated matches would compress better this way, but the result was the opposite, even when I modified the matching algorithm to favor previously used matches over closer ones of the same length.
    Last edited by Matt Mahoney; 24th December 2011 at 07:36.

  17. #17
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,423
    Thanks
    223
    Thanked 1,052 Times in 565 Posts
    Another interesting thing to try is generating a .rec dump with structure described in http://encode.su/threads/1381-rec-toolkit
    For example, it allows to see the effect of your parsing + lzma's entropy coder.

  18. #18
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    Yeah, I looked at number of literal and match codes of each length while I was developing the model, then took out the extra code for release. There are obviously better coding choices (like Huffman) if this were to be the final result, but I figured that it wouldn't matter too much because the modeling stage is only going to see literals and matches in any case. For example, if I used a really inefficient code like putting an escape byte before every literal, then the only impact should be on speed (in theory).

  19. #19
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    lzpre v2.0 attached. Compression is 3% worse on enwik8, but 2% better after recompression with zpaq -mm0. Compression is about 5% slower.

    Code:
     36,518,556 enwik8.zip         10.8s,  1.3s
     37,144,854 enwik8.lzpre       18.4s,  2.6s
     29,906,718 enwik8.lzpre.zpaq  14.5s  20.0s
    Changes:
    - Minimum match length is 5,6,7 for match codes of length 2,3,4.
    - Only 2 hashes instead of 3: order 10 (bucket size 16) and order 5 (bucket size .
    - Short (2 byte) matches encode a length of 5..12 instead of 4..11 (incompatible with v1.0).
    - If a match of length >= 128 is found, then don't look for a longer one.
    - Decompression only allocates the memory it needs (16 MB) instead of 64 MB.

    I can improve compression slightly (0.2%) by choosing a bucket element randomly to update, but there is too much speed penalty (about 10%) even with a fast PRNG. Not sure why, because buckets are cache aligned.

    Not sure I like this format, however. I did an experiment where I removed all the 2 byte match codes (forcing 3 bytes) and there was hardly any effect on compression after modeling. There may be other uses for these codes.

  20. #20
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    lzpre20.zip (source and Win32 exe) attached.
    Attached Files Attached Files

  21. #21
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    I wrote a ZPAQ config file to go with lzpre v2. I'm a bit disappointed that the ZPAQL version of the decoder is slower than the C++ version (4.0 sec. vs. 2.5 sec on enwik. Nevertheless, I posted lz1.zip with the preprocessor (lzpre.cpp, lzpre.exe) and config file (lz1.cfg) to the sample configuration files listed in http://mattmahoney.net/dc/zpaq.html
    I changed the model from a CM to an ICM because it gives better compression on calgary.tar and the maximumcompression.com corpus, although worse on enwik8/9.

    zpaq -mlz1 enwik9 -> 271,702,398, 391 sec (66 MB), 227 sec (18 MB), 1 thread on a 2.0 GHz T3200, Win32.
    lzpre -> 327,501,489, 215 sec, 26 sec.

    You can decode lzpre files using either "lzpre d input output" or "zpaq -mlz1 r input output" to compare the C++ and ZPAQL versions.

  22. #22
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    It seems I traced most of the slowness of the ZPAQL version to the OUT instruction.

    lzpre d enwik8.lzpre enwik8 -> 2.1 sec (process time)
    lzpre d enwik8.lzpre nul: -> 1.5 sec
    zpaq -mlz1 r enwik8.lzpre enwik8 -> 4.0 sec
    zpaq -mlz1 r enwik8.lzpre nul: -> 3.5 sec

    Then removing "out" from lz1.cfg
    zpaq -mlz1 r enwik8.lzpre nul: 1.8 sec

    The ZPAQL version generates JIT code that translates "out" to a call to a function that passes 3 parameters (this, out, sha1) to a C++ function and calls it. That function in turn tests out and sha1 for NULL and if not, calls a virtual function Writer:: put(int) that each points to, and that in turn calls putc(). OTOH the C++ probably just inlines code to write to a buffer.
    Last edited by Matt Mahoney; 30th December 2011 at 19:24.

Similar Threads

  1. LZ77 Performance Issues
    By david_werecat in forum Data Compression
    Replies: 4
    Last Post: 23rd August 2011, 19:06
  2. Code generation in LZ decoder / Branchless LZ77 Decoder
    By Shelwien in forum Data Compression
    Replies: 1
    Last Post: 30th September 2010, 21:48
  3. balz v1.00 - new LZ77 encoder is here!
    By encode in forum Forum Archive
    Replies: 61
    Last Post: 17th April 2008, 23:57
  4. LZ77 speed optimization, 2 mem accesses per "round"
    By Lasse Reinhold in forum Forum Archive
    Replies: 4
    Last Post: 11th June 2007, 22:53
  5. Fast arithcoder for compression of LZ77 output
    By Bulat Ziganshin in forum Forum Archive
    Replies: 13
    Last Post: 15th April 2007, 18:40

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •