zpaq 1.05 has been posted. http://mattmahoney.net/dc/#zpaq
The config files describe a compression algorithm: a context mixing model (an arrangement of various bit predictors), a program (in ZPAQL) to compute their contexts, and optionally another program in ZPAQL to post-process the data after it has been decoded. If you use post-processing, then you also need a preprocessor that does the inverse, but the preprocessor is only needed during compression. When you compress, the config file will call the preprocessor to create a temporary file. Then it will test your postprocessor to check whether it can restore the original data. If not, ZPAQ will refuse to compress the file because it would not decompress correctly. If the test passes, then ZPAQ compresses the temporary file and appends a description of the model and postprocessing code to the archive. When decompressing, you don't need any other files.
This is different than version 1.04 because it had only 2 preprocessors (e8e9 and lzp) that were built in and didn't have a compress time check (not that I found any files where they failed). In v1.05 I made these external. There is a program lzppre.exe (in C++) and exepre.cfg (in ZPAQL) that are called from min.cfg and exe.cfg respectively. exe.cfg is just max.exe with the e8e9 transform.
In v1.04, min.cfg looks like this:
Code:
comp 3 3 18 20 1 (hh hm ph pm n)
0 cm 19 5 (context model size=2^19, limit=5*4)
hcomp
*d<>a a^=*d a<<= 8 *d=a (order 3 context)
halt
post
p 127 2 96 (LZP esc minlen hmul (order 4, min length 3))
end
The command "p 127 2 96" tells v1.04 to run the LZP preprocessor using 127 as the escape code, 2 as the minimum length-1, and 96 as the hash multiplier (which determines the context order). The preprocessor would insert the corresponding postprocessor code.
v1.05 does not have any internal preprocessors. min.cfg looks like this:
Code:
(zpaq 1.05 minimum (fast) compression)
comp 3 3 18 20 1 (hh hm PH PM n)
0 cm 19 5 (context model size=2^19, limit=5*4)
hcomp
*d<>a a^=*d a<<= 8 *d=a (order 3 context)
halt
pcomp lzppre 18 20 127 2 96
(lzppre PH PM ESC MINLEN HMUL)
(If you change these values, then change them in the code too)
(The sequence ESC 0 codes for ESC. The sequence ESC LEN
codes for a match of length LEN+MINLEN at the last place
in the output buffer M (size 2^PM) that had the same context
hash in the low PH bits of D. D indexes hash table H
which points into buffer M, which contains B bytes.
When called, A contains the byte to be decoded and F=true
if the last byte was ESC. The rolling context hash D is
updated by D=D*HMUL+M[B])
if (last byte was ESC then copy from match)
a> 0 jf 37 (goto output esc)
a+= 2 (MINLEN)
r=a 0 (save length in R0)
c=*d (c points to match)
do (find match and output it)
*d=b (update index with last output byte)
a=*c *b=a b++ c++ out (copy and output matching byte)
d<>a a*= 96 (HMUL)
a+=d d<>a (update context hash)
a=r 0 a-- r=a 0 (decrement length)
a> 0 while (repeat until length is 0)
halt
endif
(otherwise, set F for ESC)
a== 127 (ESC) if
halt
endif
(reset state at EOF)
a> 255 if
a>a halt (F=0)
(goto here: output esc)
a= 127 (ESC)
endif
*d=b (update index)
*b=a b++ out (update buffer and output)
d<>a a*= 96 (HMUL)
a+=d d<>a (update context hash)
halt
end
The COMP and HCOMP sections are the same. This describes the context model (a simple order 2 actually, not 3). The PCOMP section starts with the preprocessor command. v1.05 includes a program lzppre.exe (and lzppre.cpp source) that was taken from the old class PreProcessor (now gone) and modified into a standalone program. But instead of inserting the postprocessing code automatically into the archive, it is inserted from the config file.
You might wonder how this is an improvement.
The intention is to make it easier to write new transforms, for example, a dictionary transform for text or a 2-D delta transform for images. Before, you had to modify zpaq.cpp class PreProcessor to add transform code, then write and debug the inverse in ZPAQL and do lots of testing to make sure it never failed to recover the data. Now you don't need to touch zpaq.cpp. You write a separate preprocessor program in any language you want and the postprocessing code in ZPAQL in the config file. ZPAQ will run the input through both programs to make sure the input is recoverable. There are also tools in ZPAQ to debug ZPAQL programs. For example, you can run a config file as a regular program. I did this with the e8e9 transform because the encoder and decoder only differed in 1 instruction so it was easier that way than writing it in C++. exe.cfg (max.cfg with e8e9) does this:
Code:
pcomp zpaq rexepre.cfg
exe.cfg contains the preprocessor code. The "r" command runs the program with an input file and output file as 2 arguments, that PCOMP conveniently appends to the command.