Page 2 of 85 FirstFirst 12341252 ... LastLast
Results 31 to 60 of 2550

Thread: zpaq updates

  1. #31
    Member
    Join Date
    Feb 2009
    Location
    Cicero, NY
    Posts
    9
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Matt,

    What is a reasonable upper bound for memory? My machine error's out when I allow Zpaq to use more than 2100 MB. I can lower this limit to stay within a range that is testable on your machine, and to also facilitate usable configurations.

    I just reduced my enwik9 config to use 1997.568 MB and possibly get a small increase in performance, to be tested later today.

    In the past when I made max_enwik9, initially I used XWRT to pre-code the data, but found after some testing I could achieve better results without using XWRT. I am not sure if the same will prove true for DRT but I will continue to press a zpaq config to match the result of a DRT|zpaq combination.

    Thanks,

    Mike

  2. #32
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,572
    Thanks
    779
    Thanked 687 Times in 372 Posts
    windows doesn't provide contiguous memory block larger than 2gb. it may be fixed by usimng 64-bit paq version or by changing paq to allocate memory in several chunks

  3. #33
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 797 Times in 489 Posts
    max_enwik9.drt uses 1952 MB which is about as high as I can go on my PC in 32 bit Vista. BTW I did get 145,991,692 for enwik9.drt. In my results I added the compressed dictionary size to get 146,078,502.

    http://mattmahoney.net/dc/text.html#1493

    It would be interesting if xwrt gives better results. I would try it with space modeling off, try shorter words on, and capitalization modeling on. I think xwrt also uses bytes 128-255 to code words. max_enwik9drt.cfg modified the word hashing code to accept 65-255 as letters and be case sensitive. There is no source for DRT but it uses the same dictionary format as xwrt and is probably based on an earlier version of it (xml-wrt) which used external dictionaries.

    zpaq can allocate at most 1 GB per component except MIX which can allocate < 2 GB depending on number of inputs and MATCH which can have 2 arrays up to 1 GB each. Each allocation is separate so on a 64 bit machine you could potentially have much more memory.

  4. #34
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,572
    Thanks
    779
    Thanked 687 Times in 372 Posts
    Matt, how about 64-bit executable?

  5. #35
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 797 Times in 489 Posts
    First I need 64 bit Windows/Linux. Anyone want to compile it for me?

  6. #36
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,572
    Thanks
    779
    Thanked 687 Times in 372 Posts
    i have 64 bit msvc/icl. never used 64-bit linux but will try to download one from http://bagside.com/#fifth

  7. #37
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 797 Times in 489 Posts
    zpaq v1.04. http://mattmahoney.net/dc/

    - zpaq (but not unzpaq) can now list and extract from self extracting archives without executing them.
    - zpaq (but not unzpaq) now displays compression/decompression progress.
    - zpaqsfx.exe stub is slightly smaller.
    - zpaqsfx.cpp fixed a compiler issue (replaced "and" with "&&". g++ didn't catch it ).

    If zpaq doesn't find the string "zPQ\x01" at the beginning of the archive then it searches for the 16 byte tag to find the start. So you can append an archive to just about any file and zpaq will extract from it.

    There is no change to compression ratio, speed, memory, etc. No change in archive format, so no need to retest. The changes are only to make it a little easier to use.

    I didn't update unzpaq because it's the reference decoder and doesn't need extra features. I also didn't add the progress meter to zpaqsfx because it's supposed to be small.

  8. #38
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,572
    Thanks
    779
    Thanked 687 Times in 372 Posts
    Quote Originally Posted by Matt Mahoney View Post
    First I need 64 bit Windows/Linux. Anyone want to compile it for me?
    msvc/icl 64 bit, command line: [i]cl -O2 -Gy zpaq104.cpp

    Assertion failed: sizeof(long)==sizeof(char*), file zpaq104.cpp, line 1699

  9. #39
    Member
    Join Date
    Jun 2008
    Location
    G
    Posts
    377
    Thanks
    26
    Thanked 23 Times in 16 Posts
    no problems here on ubuntu 9.04 64bit

    with cmd: g++ -O3 -o start zpaq104.cpp

    Ps: cannot waiting until farc is 64bit compartible

  10. #40
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 797 Times in 489 Posts
    So I guess type long is 32 bits in MSVC/64 and 64 bits in g++/64. Probably the only place this is important is in class Array where I used pointer casts to align the array on 64 byte cache line boundaries.

    Code:
      n=sz-1;  // sz = number of elements of T to allocate
      data=(T*)calloc(64+(n+1)*sizeof(T), 1);  // T *data
      if (!data) fprintf(stderr, "Out of memory\n"), exit(1);
      offset=64-int((long)data&63);  // int offset = bytes to add to align
      assert(offset>0 && offset<=64);
      data=(T*)((char*)data+offset);  // aligned
    and later

    Code:
        free((char*)data-offset);
    So if sizeof(data)==8 and sizeof(long)==4 then the cast throws away the high bits of the address before the &63 instead of after. I think that should be safe on normal architectures, but who knows? Officially the behavior of the (long) cast is undefined in either case, but the next assertion should probably catch anything strange. Anyway, I think the long==pointer size check can be taken out and it will still work.

    The alignment speeds up CM, ICM, and ISSE because it guarantees that 4 consecutive prediction/updates hit the same cache line and that ICM and ISSE hash searches stay within a cache line. This is why for a CM direct lookup (like I used in max_o2.cfg) I reserved 9 low bits for the order 0 context instead of 8. The previously encoded bits of the current byte are expanded using what I called the HMAP4 code:

    abcdefgh (predicted bit 7..0 -> context)

    a -> 000000001
    b -> 00000001a
    c -> 0000001ab
    d -> 000001abc
    e -> 1abcd0001
    f -> 1abcd001e
    g -> 1abcd01ef
    h -> 1abcd1efg

    and that is XORed with the CM bytewise context (hashed or not).

    The ICM and ISSE are organized into 16 byte arrays of states indexed by the low 4 bits of the HMAP code. The high bits are combined with the context hash to index the hash table. Hash collisions are detected by an 8 bit checksum in the first byte of the 16 byte array. (Note that the low 4 bits are never 0). If a collision is detected, then the hash address is XORed with 1 and then 2, which keeps all 3 hash addresses in the same cache line during the search. The alignment also keeps the 16 byte element from crossing a cache line boundary.

    Alignment probably does not have much effect on MIX, MIX2, or SSE, but it may help a bit. It has no effect on MATCH, CONST, or AVG.

  11. #41
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 797 Times in 489 Posts
    I did some experiments with models for BWT. I modified BBB to output the uncompressed BWT transform of its input (as a single block) without compressing. I got the following results, with BBB shown for comparison.

    bbb enwik8=20847290 enwik9=164032650
    zpaq cbwt.cfg enwik8=20756606 enwik9=163563492

    bwt.cfg describes a chained order 0, 1, 2, 4 ISSE model followed by an order 0 mixer and SSE with fast adapting partial bypass.

    Code:
    comp 4 0 0 0 7
      0 icm 5 (order 0)
      1 isse 13 0 (order 1)
      2 isse 19 1 (order 2)
      3 isse 20 2 (order 4)
      4 mix 9 0 4 8 255
      5 sse 8 4 8 255
      6 mix2 0 4 5 255 255
    hcomp
      (b=last 4 bytes)
      b<>a a<<= 8 a+=b b=a
      d= 1 a<<= 24 a>>= 15 *d=a
      d++ *d=0 a=b a<<= 16 a>>= 16 hashd
      d++ *d=0 a=b hashd
      halt
    post
      0
    end
    Next step is to add BWT as a preprocessor and write the inverse BWT in ZPAQL. Probably just the fast modes, which means you would need multiple blocks for enwik9.

    BWT is attached, in case you want to make any really large block BWT transforms with 1.25 x blocksize memory.
    Attached Files Attached Files

  12. Thanks:

    encode (20th April 2016)

  13. #42
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 797 Times in 489 Posts
    zpaq 1.07 http://mattmahoney.net/dc/#zpaq

    Config files can now take arguments. For example

    zpaq cmin.cfg x calgary\* -> 1030817, 4 MB, 2.1 sec
    zpaq cmin.cfg,1 x calgary\* -> 1016944, 8 MB, 2.3 sec
    zpaq cmin.cfg,2 x calgary\* -> 1004572, 16 MB, 2.6 sec
    zpaq cmin.cfg,3,1 x calgary\* -> 998667, 33 MB, 3.1 sec
    zpaq cmin.cfg,-1 x calgary\* -> 1058727, 1 MB, 1.9 sec

    The first argument doubles memory for each increment. The second argument increases the LZP minimum match length. The arguments are passed as $1 and $2 in the config file and default to 0. You can add to them in the code (but no other arithmetic) like $1+20, which means if you pass 3 as the first argument it becomes 23. Here is the new min.cfg. Note that parameters are used to set PH and PM (log2 of sizes of arrays H and M, used as LZP index and buffer) and are also passed to lzppre.exe as arguments so it can use the same size arrays. $2+2 appears in the code both as an argument to lzppre.exe to set the minimum match length and also in the ZPAQL code to decode to the same length. These also have to match or the transform fails. (zpaq checks for this).

    Code:
    (zpaq 1.07 minimum (fast) compression.
    Uses 4 x 2^$1 MB memory. $2 increases minimum match length)
    
    comp 3 3 $1+18 $1+20 1 (hh hm PH PM n)
      0 cm $1+19 5 (context model size=2^19, limit=5*4)
    hcomp
      *d<>a a^=*d a<<= 8 *d=a (order 2 context)
      halt
    pcomp lzppre $1+18 $1+20 127 $2+2 96 ;
      (lzppre PH PM ESC MINLEN HMUL)
      (If you change these values, then change them in the code too)
    
      (The sequence ESC 0 codes for ESC. The sequence ESC LEN
       codes for a match of length LEN+MINLEN at the last place
       in the output buffer M (size 2^PM) that had the same context
       hash in the low PH bits of D. D indexes hash table H
       which points into buffer M, which contains B bytes.
       When called, A contains the byte to be decoded and F=true
       if the last byte was ESC. The rolling context hash D is
       updated by D=D*HMUL+M[B])
    
      if (last byte was ESC then copy from match)
        a> 0 jf 50 (goto output esc)
        a+= $2+2 (MINLEN)
        r=a 0 (save length in R0)
        c=*d (c points to match)
        do (find match and output it)
          *d=b (update index with last output byte)
          a=*c *b=a b++ c++ out (copy and output matching byte)
          d<>a a*= 96 (HMUL)
          a+=d d<>a (update context hash)
          a=r 0 a-- r=a 0 (decrement length)
        a> 0 while (repeat until length is 0)
        halt
      endif
    
      (otherwise, set F for ESC)
      a== 127 (ESC) if
        halt
      endif
    
      (reset state at EOF)
      a> 255 if
        b=0 c=0 a= 1 a<<= $1+18 d=a
        do
          d-- *d=0 a=d (clear index)
        a> 0 while
        halt (F=0)
    
        (goto here: output esc)
        a= 127 (ESC)
      endif
      *d=b (update index)
      *b=a b++ out (update buffer and output)
      d<>a a*= 96 (HMUL)
      a+=d d<>a (update context hash)
      halt
    end
    One minor (but incompatible) change is that the external preprocessor command has to end with ; (with a space before). Also there is a bug fix. I had to add code to clear the index at EOF to keep the postprocessor synchronized with the preprocessor. This was causing decoding errors when more than one file was compressed at a time. Alternatively I could have fixed it by saving the index to archive.$zpaq.tmp between calls. But this was easier (but slower). The code to clear H is:
    Code:
        a= 1 a<<= $1+18 d=a (clear array H)
        do
          d-- *d=0 a=d
        a> 0 while
    The size of H is 2^($1+1. The loop also exits with d=0 and F=0 which was the initial state. (d always points to H).

    v1.07 also cleans up the display when tracing code. When it dumps large memory arrays, it omits lines of all zeros.

    I cleaned up the source code too. Class ZPAQL has fewer data members and a simpler interface. I took compile() out of the class.

    mid.cfg and max.cfg now take 1 argument to double memory usage:

    zpaq cmax.cfg x calgary\* -> 644433, 35.8 sec, 244 MB
    zpaq.cmax.cfg,1 x calgary\* -> 644320, 36.4 sec, 476 MB
    zpaq.cmax.cfg,-5 x calgary\* -> 665326, 34.2 sec, 22 MB

  14. #43
    Programmer Jan Ondrus's Avatar
    Join Date
    Sep 2008
    Location
    Rychnov nad Kněžnou, Czech Republic
    Posts
    279
    Thanks
    33
    Thanked 138 Times in 50 Posts
    Quote Originally Posted by Matt Mahoney View Post
    Next step is to add BWT as a preprocessor and write the inverse BWT in ZPAQL. Probably just the fast modes, which means you would need multiple blocks for enwik9.
    I tried writing inverse BWT in ZPAQL (fast mode).

    Code:
    pcomp bwt b4194048 ;
      b=a  
      a> 255 ifnot
        *c=a (save byte in M array)
        a=c a== 7 if
          c= 0
          a=*c c++ a<<= 8 a+=*c c++ a<<= 8 a+=*c c++ a<<= 8
          a+=*c c++ a+= 255 r=a 0 (save n+255 in r0)
          a=*c c++ a<<= 8 a+=*c c++ a<<= 8 a+=*c c++ a++ a<<= 8
          a+=*c r=a 1 (save p+256 in r1)
          c= 255
        endif
        c++
      endif
    
      a=r 0 a> 0 if a<c endif if (detect end of block)
    
        (count bytes)
        d= 0 *d= 255 *d++ do d++ *d= 0 a=d a< 255 while (clear array)
        c= 255 a=r 0
        do
          c++ a>c
          d=*c d++ *d++
        while
        d= 0
        do
          c=*d d++ a=*d a+=c *d=a
          a=d a< 255
        while
    
        (fast mode: build linked list)
        c= 255 a=r 0
        do
          c++ a>c
          d=*c *d++ d=*d d-- *d=c (H[M[M[c]]++]=c)
        while
    
        (traverse list)
        d=r 1 (read p+256 from r1)
        c= 255
        do
          a=r 0
          c++ a>c
          b=d a=*b out (output M[p])
          d=*d (p=H[p])
        while
    
        c= 0
      endif
      halt
    end
    EDIT:
    This is for 4MB blocks and using 20MB memory (5x size of block).
    To increse size of block you need to change:
    1) pcomp bwt b4194048 ; -> pcomp bwt b[2^MEM-256] ;
    2) comp 4 0 22 22 7 -> comp 4 0 [MEM] [MEM] 7
    where default value for MEM is 22 (2^22 is 4MB).
    Attached Files Attached Files
    Last edited by Jan Ondrus; 6th October 2009 at 19:45.

  15. #44
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 797 Times in 489 Posts
    Very nice. I modified your config file to take a parameter to select block size, and modified bwt to also take the same parameter. Now you can compress:

    zpaq cbwt_j2.cfg,10 archive files...

    to select a 1 MB block size. In general the size is 2^(n+10)-256 bytes, where n can be 0 through 18 (1.25 GB).

    I also added a loop to clear the H array so now it works when compressing more than one file at a time.

    To make parameter passing work I modified bwt.cpp (now bwtpre.cpp) to take the same parameter. bwtpre just does a fast, forward BWT transform. I took out all the other code. It works like this:

    bwtpre n input output

    where n is the same as $1 in the config file.

    Anyway, I've been working on a ZPAQL compiler for v1.08. The idea is to convert ZPAQL to C++ , write it to a header file, then recompile zpaq to use the compiled code the next time. I tested it with this bwt transform. On enwik7 (10 MB) it reduces compression time from 41 to 36 seconds and decompression from 33 to 23 seconds. I'm still working on compiling the COMP section as well. Should save a few more seconds, I hope.
    Attached Files Attached Files

  16. #45
    Programmer Jan Ondrus's Avatar
    Join Date
    Sep 2008
    Location
    Rychnov nad Kněžnou, Czech Republic
    Posts
    279
    Thanks
    33
    Thanked 138 Times in 50 Posts
    Removed loop to clear the H array since it is not necessary.
    It didn't work for multiple files because of four zero bytes in the end of file (mark EOF). I removed them from bwtpre.cpp.

    EDIT:
    And I don't think this is correct code for clearing H array:
    Code:
        a= $1+10 b=0 c=0 d=a
        do
          d-- *d=0
        a=b a> 0 while
    Attached Files Attached Files
    Last edited by Jan Ondrus; 7th October 2009 at 10:59.

  17. #46
    Programmer Jan Ondrus's Avatar
    Join Date
    Sep 2008
    Location
    Rychnov nad Kněžnou, Czech Republic
    Posts
    279
    Thanks
    33
    Thanked 138 Times in 50 Posts
    I have taken exe transformation from paq8px_v64 (exe_jo.cpp) and translated its inverse into ZPAQL (exe_j1.cfg).

    Comparison:
    acrord32.exe 3870784 bytes
    zpaq cexe.cfg ... 1131872 bytes
    zpaq cexe_j1.cfg ... 1084657 bytes
    Attached Files Attached Files

  18. #47
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 797 Times in 489 Posts
    Quote Originally Posted by Jan Ondrus View Post
    Removed loop to clear the H array since it is not necessary.
    It didn't work for multiple files because of four zero bytes in the end of file (mark EOF). I removed them from bwtpre.cpp.

    EDIT:
    And I don't think this is correct code for clearing H array:
    Code:
        a= $1+10 b=0 c=0 d=a
        do
          d-- *d=0
        a=b a> 0 while
    You are right, but I tried zpaq cbwt_j3.cfg,15 calgary\* and it failed after the first file. Here is the correct bug fix. I will upload to http://mattmahoney.net/dc/ as bwt_j4.zip shortly.

    Code:
      else (clear H at EOF for next file)
        a= 1 a<<= $1+10 b=0 c=0 d=a
        do
          d-- *d=0
        a=b a> 0 while
      endif
    Edit: uploaded now. Also I tested again on the Calgary corpus and your code does work after all. It had failed because I was testing with the old bwtpre with the zeros on the end instead of your new one. I just tested bwt_j4.cfg and it works with either the old or new bwtpre. I need to test again with the new one because your code is probably more efficient.
    Last edited by Matt Mahoney; 7th October 2009 at 17:46.

  19. #48
    Programmer Jan Ondrus's Avatar
    Join Date
    Sep 2008
    Location
    Rychnov nad Kněžnou, Czech Republic
    Posts
    279
    Thanks
    33
    Thanked 138 Times in 50 Posts
    Quote Originally Posted by Matt Mahoney View Post
    Code:
      else (clear H at EOF for next file)
        a= 1 a<<= $1+10 b=0 c=0 d=a
        do
          d-- *d=0
        a=b a> 0 while
      endif
    Are you sure there shouldn't be a=d instead of a=b?

    And you don't need to clear H array even with old bwtpre.
    (actually it is not cleared by your buggy code)
    It will still work with only reseting register c to 0:
    Code:
      else (clear H at EOF for next file)
        c=0
      endif
    Last edited by Jan Ondrus; 7th October 2009 at 18:53.

  20. #49
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 797 Times in 489 Posts
    Quote Originally Posted by Jan Ondrus View Post
    I have taken exe transformation from paq8px_v64 (exe_jo.cpp) and translated its inverse into ZPAQL (exe_j1.cfg).

    Comparison:
    acrord32.exe 3870784 bytes
    zpaq cexe.cfg ... 1131872 bytes
    zpaq cexe_j1.cfg ... 1084657 bytes
    Very nice. Also, mso97.dll from 1,530,715 to 1,464,603.

    I added exe_j0.exe and a readme and uploaded to http://mattmahoney.net/dc/

  21. #50
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 797 Times in 489 Posts
    Quote Originally Posted by Jan Ondrus View Post
    Are you sure there shouldn't be a=d instead of a=b?

    And you don't need to clear H array even with old bwtpre.
    (actually it is not cleared by your buggy code)
    It will still work with only reseting register c to 0:
    Code:
      else (clear H at EOF for next file)
        c=0
      endif
    Yes, you're right again. I am removing bwt_j4 from my site and will use your bwt_j3.

  22. #51
    Programmer Jan Ondrus's Avatar
    Join Date
    Sep 2008
    Location
    Rychnov nad Kněžnou, Czech Republic
    Posts
    279
    Thanks
    33
    Thanked 138 Times in 50 Posts
    Quote Originally Posted by Matt Mahoney View Post
    Very nice. Also, mso97.dll from 1,530,715 to 1,464,603.

    I added exe_j0.exe and a readme and uploaded to http://mattmahoney.net/dc/
    Thanks.
    Actually the name is not exe_j0.exe but exe_jo.exe.
    It contains also exe detection so you can transform efficienty mixed data which contains some exe code. Also conditional jumps (0x0f80..0x0f8f instructions) are transformed.

    EDIT:
    So the readme file needs correction: exe_j0 -> exe_jo
    Last edited by Jan Ondrus; 7th October 2009 at 19:09.

  23. #52
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 797 Times in 489 Posts
    I updated both bwt_j3.zip and exe_j1.zip. bwt_j3 now contains bwtpre.exe and a readme file. bwtpre is now version 1.2. In exe_j1 I changed j0 to jo in the readme.

  24. #53
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 797 Times in 489 Posts
    http://mattmahoney.net/dc/zpaq108b.cpp

    This is a preview of zpaq v1.08. It partially implements compiled configurations to improve speed. Some examples (size, compression time, decompression time tested on a 2 GHz T3200, Vista 32 bit)

    zpaq cmin.cfg enwik8 -> 33,460,960, 62s, 57s; optimized 48s, 45s.
    zpaq cmid.cfg enwik8 -> 20,941,558 375s, 383s; optimized 330s, 341s.
    zpaq cbwt_j3.cfg enwik8 -> 20,756,743, 473s, 290s, optimized 414s, 259s.

    To optimize, use the "o" modifier with any of the commands, e.g. "ol" to list, "oc" or "oa" to compress, "ox" to extract, "or" to run. The result is 2 files will be created: zpcache.h and zprun.h. Then compile zpaq with -DOPT. This will tell it to include these files. The newly compiled program will then be faster on any archives or config files using with the same configurations. For example, suppose you have an archive compressed in a new format and you want to decompress it as fast as possible. Instead of

    zpaq x archive

    do

    zpaq ol archive (list and produce optimization code)
    g++ zpaq.cpp [usual optimizations] -DNDEBUG -DOPT
    zpaq x archive (extract faster)

    Suppose you want to compress a file with max.cfg faster. Instead of

    zpaq cmax.cfg archive file

    do (and this is not the only way)

    touch empty (make a small file)
    zpaq ocmax.cfg empty.zpaq empty (creates zpcache.h, zprun.h)
    del empty.zpaq empty (we don't need these any more)
    g++ zpaq.cpp [-O2 etc] -DNDEBUG -DOPT
    zpaq cmax.cfg archive file (compress for real)

    Any already optimized configurations are written to the .h files so they are not lost. Any time zpaq sees a new config file or archive, it looks up the model in the list stored in zpcache.h. If it finds a match, it runs the corresponding compiled ZPAQL code in zprun.h instead of interpreting the code as usual. If it can't find a match, it interprets the code. A config file or archive block header has 2 models written in ZPAQL byte code, HCOMP and PCOMP. HCOMP calculates the contexts for the model. PCOMP is optional and post-processes the decoder output. It is also run during compression to test the preprocessor.

    Translating ZPAQL to C++ is pretty straightforward. Most instructions are simple statements, e.g. "a=*d" translates to "a=h(d);" which means a=h[d&(h.size()-1)]; Jump instructions (jt, jf, jmp, lj) translate to "goto" or "if (f) goto". I wrote a little perl script that reads the big switch statement in ZPAQL::execute() and generates most of the translations.

    zprun.h contains code for ZPAQL::run() as a switch statement that switches on the selected model, then executes the translated ZPAQL statements. ZPAQL::init() compares its model (a string of bytes) to a list in zpcache.h, which defines a long string of concatenated headers as a linked list. If it finds a match, it sets ZPAQL::select, which is used by run(). If it doesn't find a match, then it sets select to 0 and appends its model to the list. A 0 means to use the interpreted version of run(). If you use the "o" option, then when everything else is done the cache is used to generate new versions of zpcache.h and zprun.h, so that any previous optimizations are preserved.

    v1.08b is a preview, not a release. I plan to optimize Predictor:: predict() and Predictor::update() too. Compiling here will save less time per iteration (mostly by replacing parameters with constants and maybe mixer loop unrolling) but there is more opportunity to save time because the predictor is run every bit instead of every byte.

  25. #54
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 797 Times in 489 Posts
    http://mattmahoney.net/dc/zpaq108c.cpp

    I implemented compiled code in the COMP section in addition to PCOMP. After optimization, it uses 50% to 70% as much time as v1.07. In particular, max.cfg and variations are almost twice as fast.

    This will be v1.08 if it passes additional tests that I will do tomorrow. v1.08c is not official yet, however.

    Also, when you run the program with no arguments, it will show you which models have been optimized. (Not all the details, but the number of components and memory needed). Also, if a model is optimized and doesn't require a preprocessor then you can compress without the corresponding config file. For example:

    zpaq otrmax.cfg (optimize for max.cfg to zpaqopt.h)
    g++ -O2 -fomit-frame-pointer -march=pentiumpro -s zpaq.cpp -o zpaq -DNDEBUG -DOPT (includes zpaqopt.h)
    zpaq (shows model 1 is available)
    zpaq c,1 archive files... (compresses with max.cfg)

    This is also convenient if you have an archive and want to compress some other files in the same format but don't have the config file. The first command would be "zpaq ol archive" to optimize and list the contents.

    All of the optimization code goes into one file, zpaqopt.h, when you use the "o" modifier on any command. "otrmax.cfg" means optimize, trace with no arguments, run max.cfg. But since there are no additional arguments the config file doesn't actually run. This is a convenient way to generate zpaqopt.h for a config file. The code is also automatically commented, which is handy for disassembling archive headers using unknown compression algorithms.

  26. #55
    Member
    Join Date
    Aug 2009
    Location
    Bari
    Posts
    74
    Thanks
    1
    Thanked 1 Time in 1 Post
    yeessss, thanks Matt Mahoney

  27. #56
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 797 Times in 489 Posts
    One more test.

    zpaq cmax4.cfg enwik9 -> 157,246,349
    Unoptimized (v1.03 from LTCB): 14061 sec, 13077 sec.
    Optimized (v1.0: 7555 sec, 7579 sec.

  28. #57
    Member
    Join Date
    Aug 2009
    Location
    Bari
    Posts
    74
    Thanks
    1
    Thanked 1 Time in 1 Post

  29. #58
    Programmer Jan Ondrus's Avatar
    Join Date
    Sep 2008
    Location
    Rychnov nad Kněžnou, Czech Republic
    Posts
    279
    Thanks
    33
    Thanked 138 Times in 50 Posts
    bmp_j1.cfg is ZPAQ configuration file for compressing 24-bit BMP images.
    It reads image width from byte 19 and 20 in file (standard 54-byte BMP header). For compressing 24-bit image data in other format you need replace this line
    Code:
      a=c a== 20 if b=c a=*b a<<= 8 b-- a+=*b a*= 3 r=a 0 endif (r0=w)
    to load image width into R[0] register.
    Compression model is taken from paq8px_v64. Preprecessing is not performed in current version. It accepts one parameter for setting memory requirements.

    Results:
    rafale.bmp ... 4149414 bytes
    zpaq cmax.cfg ... 714263 bytes
    zpaq cbmp_j1.cfg ... 538228 bytes
    Attached Files Attached Files
    Last edited by Jan Ondrus; 9th October 2009 at 18:57.

  30. #59
    Member
    Join Date
    Aug 2009
    Location
    Bari
    Posts
    74
    Thanks
    1
    Thanked 1 Time in 1 Post
    why don't you do only one file .cfg which includes filters for many files?
    sorry for the request too indiscreet.

  31. #60
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 797 Times in 489 Posts
    Quote Originally Posted by PiPPoNe92 View Post
    why don't you do only one file .cfg which includes filters for many files?
    sorry for the request too indiscreet.
    The plan is for zpaq to be a tool for writing compressors that don't need config files or external preprocessors. Ideally a compressor would have many different models built in and select one by looking at the file, or trying several and picking the best one. Preprocessors would be built in. It would be up to the developer to test them internally or not.

    zpaq v1.08 won't quite achieve that. I am working on a simpler user interface, however, something like "ocF" command to optimize and compress with config file F in one step, or "ox" to optimize and decompress in one step. Maybe another command to produce optimized self extracting archives.

Page 2 of 85 FirstFirst 12341252 ... LastLast

Similar Threads

  1. ZPAQ self extracting archives
    By Matt Mahoney in forum Data Compression
    Replies: 31
    Last Post: 17th April 2014, 03:39
  2. ZPAQ 1.05 preview
    By Matt Mahoney in forum Data Compression
    Replies: 11
    Last Post: 30th September 2009, 04:26
  3. zpaq 1.02 update
    By Matt Mahoney in forum Data Compression
    Replies: 11
    Last Post: 10th July 2009, 00:55
  4. Metacompressor.com benchmark updates
    By Sportman in forum Data Compression
    Replies: 79
    Last Post: 22nd April 2009, 03:24
  5. ZPAQ pre-release
    By Matt Mahoney in forum Data Compression
    Replies: 54
    Last Post: 23rd March 2009, 02:17

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •