Uh, its used like this:
Code:
PPMTrain.exe BOOK1 // creates !PPMd.mdl
PPMd1.exe e BOOK2 // uses !PPMd.mdl, its hardcoded
As to options, the additional -s in ppmtrain defines the size of created !PPMd.mdl -
ppmtrain applies its tree reduction algo until it reaches the required size (the same one as in ppmd -r1).
But saving statistics in such a way is probably not the best idea.
If its just to measure the similarity of files, it should be ok to just compress the first file separately and then
two files together, and calculate the difference.
Also in most cases the same approach actually works for creating file diffs too -
Code:
ppmd e book1 1
ppmd e book1+book2 2
diff 1 2 3
If you really need an artifical dictionary as a reference, instead of a specific file,
then an interesting option may be to use ppmd -r2 to compress the base data,
then introduce an error in the "frozen" part of compressed file - after that on decoding,
ppmd would generate some random data using the model's statistics.
Anyway, the most efficient description of statistics is the compressed data -
some snapshot of memory structures would be either huge or lose important
information.
Also you might want to look at http://libbsc.com/ - its text compression is also good, but its faster.