Results 1 to 6 of 6

Thread: Dict preprocessor

  1. #1
    Member
    Join Date
    May 2008
    Location
    Antwerp , country:Belgium , W.Europe
    Posts
    487
    Thanks
    1
    Thanked 3 Times in 3 Posts

    Dict preprocessor

    Another interesting preprocessor for highly redundant txt files is Bulat's DICT (http://www.haskell.org/bz/)
    FreeArc uses this DICT preprocessor for txt-alike files (in some modes)

    Tested on FP.LOG (20.617.071 bytes)

    timer dict -pt fp.log 1.dict -> 5.736.001 bytes (time 0.733s)

    paq8o10

    timer paq8o10 -7 fplog_paq8o10_7 fp.log
    20617071 -> 264985 (1984.70 sec)

    timer paq8o10 -7 1.dict fplog_dict_pre_paq8o10_7
    5736001 -> 223619 (516.85 sec)

    Even in paq8o10 -8 mode, dict preprocessor outputs a smaller file
    in about 25% time compared to no dict preproc. !!

    timer paq8o10 -8 fplog_paq8o10_8 fp.log
    20617071 -> 263139 (2040.63 sec)

    timer paq8o10 -8 fplog_dict_paq8o10_8 fp.dict
    5736001 -> 223267 (518.68 sec)

    Paq8o10t :
    timer paq8o10t -8 fplog_paq8o10t_8 fp.log
    20617071 -> 258587

    timer paq8o10t -8 fplog_dict_paq8o10t_8 fp.dict
    5736001 -> 236272 (416.76 sec)

    dict + paq8o10 outputs a smaller file than the new paq8o10t !
    Paq8o10t has some room for improvement !
    Maybe paq8o10t could use this preprocessor

    CMM4 v0.1f :
    timer cmm4 57 fp.log fplog_cmm4_1f_57.cmm4f
    Ratio: 426815/20617071 bytes (0.17 bpc) (Time: 16.07 s)

    timer cmm4 57 fp.dict fplog_dict_cmm4_1f_57.cmm4f
    Ratio: 357406/5736001 bytes (0.50 bpc) (Time: 5.33 s)

    RZM 0.07h
    timer rzm c fp.log fplog_rzm007h_nr2.rzm
    20133kb -> 494kb (506420b, 2.46%), done.
    --> 506.427 bytes (rzm reports 7 bytes less !!)
    time = 19.235s

    timer rzm c fp.dict fplog_dict_rzm007h.rzm
    5601kb -> 421kb (431458b, 7.52%), done.
    -> 431.465 byes (rzm reports 7 bytes less)
    time = 5.148s (4x faster !!)

    BALZ v1.13
    timer balz ex fp.log fplog_balz113.balz
    -> 551055 bytes in 20.904 sec

    timer balz ex fp.dict fplog_dict_balz113.balz
    -> 514806 bytes in 6.615 sec

    timer balz e fp.log fplog_balz113_e_nr2.balz
    -> 662.941 bytes in 2.777s

    timer balz e fp.dict fplog_dict_balz113_e.balz
    -> 543.768 bytes in 1.982s

    Brute CM v0.1d2
    timer bcm e fp.log fplog_bcm01d2.bcm
    -> 786.749 bytes in 1.388s

    timer bcm e fp.dict fplog_dict_bcm01d2.bcm
    -> 580.857 bytes in 1.186s

    BIT v0.2b
    timer bit02b a fplog_bit02b_mem9 -m lwcx -mem 9 -files fp.log
    -> 576.533 bytes in 26.498s

    timer bit02b a fplog_dict_bit02b_mem9_nr1 -m lwcx -mem 9 -files fp.dict
    -> 481.710 bytes in 9.110s

    7z 4.59 a2 -mx9
    fp.log -> 839.075 bytes
    fp.dict -> 606.485 bytes

    LPAQ 8e
    lpaq8e 7 -> 358267 bytes
    dict + lpaq8e -> 322625 bytes

    lpaq8e 7 fp.log fplog_lpaq8e_7.lpaq8e
    20617071 -> 358267 in 22.287 sec. using 390 MB memory

    paq8e 7 fp.dict fplog_dict_lpaq8e_7.lpaq8e
    5736001 -> 322625 in 7.426 sec. using 390 MB memory


    Slim 023d
    fplog_slim_o40_m1024.fb -> 343.637 bytes
    (the smallest file I could get for different orders)
    fplog_dict_prepr_slim_o9_m512.fb -> 319.601 bytes

    PPMonstr J
    fplog_ppmonstr_m900_o64.ppmm -> 355.722 bytes
    fplog_dict_preproc_ppmd_o10_m256.ppmm - > 321.429 bytes

    FP.LOG -order 10
    timer ppmonstr e -ffplog_ppmonstrJ_o10_m900.ppmdm -o10 -m900 fp.log
    Monstrous PPMII compressor based on PPMd var.J, Feb 16 2006
    fp.log:20617071 > 472084, 0.183 bpb, used: 13.6MB, speed: 1793 KB/sec
    Global Time = 11.232 = 00:00:11.232 = 100%

    FP.LOG -order 64 gives smaller output :
    timer ppmonstr e -ffplog_ppmonstrJ_o64_m1200.ppmdm -o64 -m1200 fp.log
    Monstrous PPMII compressor based on PPMd var.J, Feb 16 2006
    fp.log:20617071 > 355700, 0.138 bpb, used:621.0MB, speed: 913 KB/sec
    Global Time = 22.106 = 00:00:22.106 = 100%
    (PPMonstr reports 355700 bytes although the created file is 355722 bytes)

    FP.DICT -o10
    timer ppmonstr e -ffplog_dict_ppmonstrJ_o10_m500.ppmdm -o10 -m256 fp.dict
    Monstrous PPMII compressor based on PPMd var.J, Feb 16 2006
    fp.dict:5736001 > 321406, 0.448 bpb, used: 26.2MB, speed: 1061 KB/sec
    Global Time = 5.304 = 00:00:05.304 = 100%
    (PPMonstr reports 321406 bytes although the created file is 321.429 bytes)

    PPMonstr J outputs a smaller file using dict preprocessor; both compression time and memory usage decrease.

    FreeArc 0.50a (June 9 200

    timer arc a -mx fplog_arc5a5_mx_nr4 fp.log
    FreeArc 0.50 alpha (June 9 200 updating archive: fplog_arc5a5_mx.arc
    Compressed 1 file, 20.617.071 => 527.623 bytes. Ratio 2.5%
    Compression time 1.76 secs, speed 11.696 kB/s. Total 2.11 secs
    All OK
    Global Time = 2.146 = 00:00:02.146 = 100%

    or a bit optimized :
    timer arc a -mdict:30m+ppmd:9:900mb fplog_arc5a5_opt fp.log
    FreeArc 0.50 alpha (June 9 200 creating archive: fplog_arc5a5_opt
    Compressed 1 file, 20.617.071 => 489.353 bytes. Ratio 2.3%
    Compression time 1.12 secs, speed 18.356 kB/s. Total 1.22 secs
    All OK
    Global Time = 1.358 = 00:00:01.358 = 100%
    -> 489590 bytes in 1.35s !!

    Conclusion (only for FP.LOG !!)
    All tested compressors/archivers output a smaller file with DICT preprocessing. Speed is also increased, up to 4-5x faster for some compressors.
    Last edited by pat357; 22nd June 2008 at 00:39.

  2. #2
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by pat357 View Post
    BIT v0.2b
    timer bit02b a fplog_bit02b_mem9 -m lwcx -mem 9 -files fp.log
    -> 576.533 bytes in 26.498s

    timer bit02b a fplog_dict_bit02b_mem9_nr1 -m lwcx -mem 9 -files fp.dict
    -> 481.710 bytes in 9.110s
    Hmm...It seems BIT needs like that text filtering. Good job Bulat. You always write very good preprocessors. Thanks pat357 and Bulat.

  3. #3
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I want to share some other tests with ENWIK8.

    BIT 0.2b (-mem 22,190,736 bytes (271.289 seconds)

    DICT 52,812,125 bytes (11.316 seconds)
    BIT 0.2b (-mem + DICT 21,798,670 bytes (156.977+11.316 seconds)

    DICT (-P) 53,897,083 bytes (9.042 seconds)
    BIT 0.2b (-mem + DICT (-P) 21,522,217 bytes (159.678+9.042 seconds)

    Tested on AMD Athlon 64 X2 Dual 4200+ (2.2 GHz), 1 GB RAM, WinXP SP3.

  4. #4
    Member
    Join Date
    May 2012
    Location
    United States
    Posts
    323
    Thanks
    179
    Thanked 52 Times in 37 Posts
    Is there a file size limit with Dict? If I have an 11mb shar archive, no problem. But with a 150mb file (for example) no output file is created and Dict does not display an error. I use it as preprocessor for all data but I have the filesize problem. Thanks.
    Last edited by comp1; 2nd May 2014 at 22:07.

  5. #5
    Member
    Join Date
    Oct 2013
    Location
    Filling a much-needed gap in the literature
    Posts
    350
    Thanks
    177
    Thanked 49 Times in 35 Posts
    pat357
    Another interesting preprocessor for highly redundant txt files is Bulat's DICT (http://www.haskell.org/bz/)
    That link doesn't work for me. (404 Not Found.)

  6. #6
    Member
    Join Date
    May 2012
    Location
    United States
    Posts
    323
    Thanks
    179
    Thanked 52 Times in 37 Posts

Similar Threads

  1. FreeArc compression suite (4x4, Tornado, REP, Delta, Dict...)
    By Bulat Ziganshin in forum Data Compression
    Replies: 554
    Last Post: 26th September 2018, 02:41
  2. Images PreProcessor - PrePNG
    By PAQer in forum Data Compression
    Replies: 3
    Last Post: 21st May 2010, 12:21
  3. Two dimentional Multimedia preprocessor
    By chornobyl in forum Data Compression
    Replies: 18
    Last Post: 7th October 2008, 16:54
  4. flzp, new LZP compressor/preprocessor
    By Matt Mahoney in forum Data Compression
    Replies: 13
    Last Post: 23rd June 2008, 17:24
  5. impresseed by RPE preprocessor
    By SvenBent in forum Forum Archive
    Replies: 6
    Last Post: 24th October 2007, 12:43

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •