zpaq 1.07 http://mattmahoney.net/dc/#zpaq
Config files can now take arguments. For example
zpaq cmin.cfg x calgary\* -> 1030817, 4 MB, 2.1 sec
zpaq cmin.cfg,1 x calgary\* -> 1016944, 8 MB, 2.3 sec
zpaq cmin.cfg,2 x calgary\* -> 1004572, 16 MB, 2.6 sec
zpaq cmin.cfg,3,1 x calgary\* -> 998667, 33 MB, 3.1 sec
zpaq cmin.cfg,-1 x calgary\* -> 1058727, 1 MB, 1.9 sec
The first argument doubles memory for each increment. The second argument increases the LZP minimum match length. The arguments are passed as $1 and $2 in the config file and default to 0. You can add to them in the code (but no other arithmetic) like $1+20, which means if you pass 3 as the first argument it becomes 23. Here is the new min.cfg. Note that parameters are used to set PH and PM (log2 of sizes of arrays H and M, used as LZP index and buffer) and are also passed to lzppre.exe as arguments so it can use the same size arrays. $2+2 appears in the code both as an argument to lzppre.exe to set the minimum match length and also in the ZPAQL code to decode to the same length. These also have to match or the transform fails. (zpaq checks for this).
Code:
(zpaq 1.07 minimum (fast) compression.
Uses 4 x 2^$1 MB memory. $2 increases minimum match length)
comp 3 3 $1+18 $1+20 1 (hh hm PH PM n)
0 cm $1+19 5 (context model size=2^19, limit=5*4)
hcomp
*d<>a a^=*d a<<= 8 *d=a (order 2 context)
halt
pcomp lzppre $1+18 $1+20 127 $2+2 96 ;
(lzppre PH PM ESC MINLEN HMUL)
(If you change these values, then change them in the code too)
(The sequence ESC 0 codes for ESC. The sequence ESC LEN
codes for a match of length LEN+MINLEN at the last place
in the output buffer M (size 2^PM) that had the same context
hash in the low PH bits of D. D indexes hash table H
which points into buffer M, which contains B bytes.
When called, A contains the byte to be decoded and F=true
if the last byte was ESC. The rolling context hash D is
updated by D=D*HMUL+M[B])
if (last byte was ESC then copy from match)
a> 0 jf 50 (goto output esc)
a+= $2+2 (MINLEN)
r=a 0 (save length in R0)
c=*d (c points to match)
do (find match and output it)
*d=b (update index with last output byte)
a=*c *b=a b++ c++ out (copy and output matching byte)
d<>a a*= 96 (HMUL)
a+=d d<>a (update context hash)
a=r 0 a-- r=a 0 (decrement length)
a> 0 while (repeat until length is 0)
halt
endif
(otherwise, set F for ESC)
a== 127 (ESC) if
halt
endif
(reset state at EOF)
a> 255 if
b=0 c=0 a= 1 a<<= $1+18 d=a
do
d-- *d=0 a=d (clear index)
a> 0 while
halt (F=0)
(goto here: output esc)
a= 127 (ESC)
endif
*d=b (update index)
*b=a b++ out (update buffer and output)
d<>a a*= 96 (HMUL)
a+=d d<>a (update context hash)
halt
end
One minor (but incompatible) change is that the external preprocessor command has to end with ; (with a space before). Also there is a bug fix. I had to add code to clear the index at EOF to keep the postprocessor synchronized with the preprocessor. This was causing decoding errors when more than one file was compressed at a time. Alternatively I could have fixed it by saving the index to archive.$zpaq.tmp between calls. But this was easier (but slower). The code to clear H is:
Code:
a= 1 a<<= $1+18 d=a (clear array H)
do
d-- *d=0 a=d
a> 0 while
The size of H is 2^($1+1
. The loop also exits with d=0 and F=0 which was the initial state. (d always points to H).
v1.07 also cleans up the display when tracing code. When it dumps large memory arrays, it omits lines of all zeros.
I cleaned up the source code too. Class ZPAQL has fewer data members and a simpler interface. I took compile() out of the class.
mid.cfg and max.cfg now take 1 argument to double memory usage:
zpaq cmax.cfg x calgary\* -> 644433, 35.8 sec, 244 MB
zpaq.cmax.cfg,1 x calgary\* -> 644320, 36.4 sec, 476 MB
zpaq.cmax.cfg,-5 x calgary\* -> 665326, 34.2 sec, 22 MB