Thanks, but I don't drink alcohol
Sadly that values are scrolled out of my command line buffer, I only have this files left and some data:
1,036,817,655 bytes, size after treecapencode
367,356,817 bytes, size after treecompress
Last Iteration:
Iteration 842: 17.7148 bits per token, Shannon file size 208614318 bytes
Run time 174859 seconds. 6d9845.6d9845
treebitencode:
Read 94210190 tokens including 7014470 token definition symbols
Tokenize done
Embed tokens done
Compressed file of size 187933388 bytes uses a 7020759 'letter' alphabet
Does that mean average 26.768 bits (3.346 bytes) each 'letter' is used?
After restart it show:
treecompress enwik8:
Allocated 1946157056 bytes for data processing
Read 103340635 byte input file
Found 102962467 tokens, 0 defines, maximum unicode value 0x28b46
treecompress enwik9
Allocated 1946157056 bytes for data processing
Read 1036817655 byte input file
Found 1034338546 tokens, 0 defines, maximum unicode value 0x28b4e