Another interesting preprocessor for highly redundant txt files is Bulat's DICT (http://www.haskell.org/bz/)
FreeArc uses this DICT preprocessor for txt-alike files (in some modes)
Tested on FP.LOG (20.617.071 bytes)
timer dict -pt fp.log 1.dict -> 5.736.001 bytes (time 0.733s)
paq8o10
timer paq8o10 -7 fplog_paq8o10_7 fp.log
20617071 -> 264985 (1984.70 sec)
timer paq8o10 -7 1.dict fplog_dict_pre_paq8o10_7
5736001 -> 223619 (516.85 sec)
Even in paq8o10 -8 mode, dict preprocessor outputs a smaller file
in about 25% time compared to no dict preproc. !!
timer paq8o10 -8 fplog_paq8o10_8 fp.log
20617071 -> 263139 (2040.63 sec)
timer paq8o10 -8 fplog_dict_paq8o10_8 fp.dict
5736001 -> 223267 (518.68 sec)
Paq8o10t :
timer paq8o10t -8 fplog_paq8o10t_8 fp.log
20617071 -> 258587
timer paq8o10t -8 fplog_dict_paq8o10t_8 fp.dict
5736001 -> 236272 (416.76 sec)
dict + paq8o10 outputs a smaller file than the new paq8o10t !
Paq8o10t has some room for improvement !
Maybe paq8o10t could use this preprocessor
CMM4 v0.1f :
timer cmm4 57 fp.log fplog_cmm4_1f_57.cmm4f
Ratio: 426815/20617071 bytes (0.17 bpc) (Time: 16.07 s)
timer cmm4 57 fp.dict fplog_dict_cmm4_1f_57.cmm4f
Ratio: 357406/5736001 bytes (0.50 bpc) (Time: 5.33 s)
RZM 0.07h
timer rzm c fp.log fplog_rzm007h_nr2.rzm
20133kb -> 494kb (506420b, 2.46%), done.
--> 506.427 bytes (rzm reports 7 bytes less !!)
time = 19.235s
timer rzm c fp.dict fplog_dict_rzm007h.rzm
5601kb -> 421kb (431458b, 7.52%), done.
-> 431.465 byes (rzm reports 7 bytes less)
time = 5.148s (4x faster !!)
BALZ v1.13
timer balz ex fp.log fplog_balz113.balz
-> 551055 bytes in 20.904 sec
timer balz ex fp.dict fplog_dict_balz113.balz
-> 514806 bytes in 6.615 sec
timer balz e fp.log fplog_balz113_e_nr2.balz
-> 662.941 bytes in 2.777s
timer balz e fp.dict fplog_dict_balz113_e.balz
-> 543.768 bytes in 1.982s
Brute CM v0.1d2
timer bcm e fp.log fplog_bcm01d2.bcm
-> 786.749 bytes in 1.388s
timer bcm e fp.dict fplog_dict_bcm01d2.bcm
-> 580.857 bytes in 1.186s
BIT v0.2b
timer bit02b a fplog_bit02b_mem9 -m lwcx -mem 9 -files fp.log
-> 576.533 bytes in 26.498s
timer bit02b a fplog_dict_bit02b_mem9_nr1 -m lwcx -mem 9 -files fp.dict
-> 481.710 bytes in 9.110s
7z 4.59 a2 -mx9
fp.log -> 839.075 bytes
fp.dict -> 606.485 bytes
LPAQ 8e
lpaq8e 7 -> 358267 bytes
dict + lpaq8e -> 322625 bytes
lpaq8e 7 fp.log fplog_lpaq8e_7.lpaq8e
20617071 -> 358267 in 22.287 sec. using 390 MB memory
paq8e 7 fp.dict fplog_dict_lpaq8e_7.lpaq8e
5736001 -> 322625 in 7.426 sec. using 390 MB memory
Slim 023d
fplog_slim_o40_m1024.fb -> 343.637 bytes
(the smallest file I could get for different orders)
fplog_dict_prepr_slim_o9_m512.fb -> 319.601 bytes
PPMonstr J
fplog_ppmonstr_m900_o64.ppmm -> 355.722 bytes
fplog_dict_preproc_ppmd_o10_m256.ppmm - > 321.429 bytes
FP.LOG -order 10
timer ppmonstr e -ffplog_ppmonstrJ_o10_m900.ppmdm -o10 -m900 fp.log
Monstrous PPMII compressor based on PPMd var.J, Feb 16 2006
fp.log:20617071 > 472084, 0.183 bpb, used: 13.6MB, speed: 1793 KB/sec
Global Time = 11.232 = 00:00:11.232 = 100%
FP.LOG -order 64 gives smaller output :
timer ppmonstr e -ffplog_ppmonstrJ_o64_m1200.ppmdm -o64 -m1200 fp.log
Monstrous PPMII compressor based on PPMd var.J, Feb 16 2006
fp.log:20617071 > 355700, 0.138 bpb, used:621.0MB, speed: 913 KB/sec
Global Time = 22.106 = 00:00:22.106 = 100%
(PPMonstr reports 355700 bytes although the created file is 355722 bytes)
FP.DICT -o10
timer ppmonstr e -ffplog_dict_ppmonstrJ_o10_m500.ppmdm -o10 -m256 fp.dict
Monstrous PPMII compressor based on PPMd var.J, Feb 16 2006
fp.dict:5736001 > 321406, 0.448 bpb, used: 26.2MB, speed: 1061 KB/sec
Global Time = 5.304 = 00:00:05.304 = 100%
(PPMonstr reports 321406 bytes although the created file is 321.429 bytes)
PPMonstr J outputs a smaller file using dict preprocessor; both compression time and memory usage decrease.
FreeArc 0.50a (June 9 200
timer arc a -mx fplog_arc5a5_mx_nr4 fp.log
FreeArc 0.50 alpha (June 9 200updating archive: fplog_arc5a5_mx.arc
Compressed 1 file, 20.617.071 => 527.623 bytes. Ratio 2.5%
Compression time 1.76 secs, speed 11.696 kB/s. Total 2.11 secs
All OK
Global Time = 2.146 = 00:00:02.146 = 100%
or a bit optimized :
timer arc a -mdict:30m+ppmd:9:900mb fplog_arc5a5_opt fp.log
FreeArc 0.50 alpha (June 9 200creating archive: fplog_arc5a5_opt
Compressed 1 file, 20.617.071 => 489.353 bytes. Ratio 2.3%
Compression time 1.12 secs, speed 18.356 kB/s. Total 1.22 secs
All OK
Global Time = 1.358 = 00:00:01.358 = 100%
-> 489590 bytes in 1.35s !!
Conclusion (only for FP.LOG !!)
All tested compressors/archivers output a smaller file with DICT preprocessing. Speed is also increased, up to 4-5x faster for some compressors.