Results 1 to 13 of 13

Thread: Tornado 0.3

  1. #1
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,505
    Thanks
    741
    Thanked 665 Times in 359 Posts
    i've finally uploaded Tornado 0.3 to http://www.haskell.org/bz/tornado03.zip

    Changes in 0.2:
    -- lazy parsing
    -- 3-byte matches
    -- huffman coder
    -- sliding window

    Changes in 0.3:
    -- repdist&repchar0 codes
    -- 2-byte matches
    -- optimized lz parsing
    -- table preprocessing
    -- gzip-like cmdline interface

    comparable modes:
    -1 thor e1, quicklz
    -2 thor e2, slug
    -3 thor e3, gzip -1
    -4 gzip, rar -m1
    -5 thor, 7zip -mx1
    -6 uharc -mz
    -7 bzip2, rar -m2

  2. #2
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Thanks Bulat! But what means Optimized LZ Parsing?

  3. #3
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,505
    Thanks
    741
    Thanked 665 Times in 359 Posts
    Quote Originally Posted by encode
    Optimized LZ Parsing?
    improved here and there. it still uses "lazy matching". you may find list of changes in main.cpp. for example, i used Igors code for selection between short near and larger far matches

  4. #4
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    THANKS Bulat! Hi !

  5. #5
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,505
    Thanks
    741
    Thanked 665 Times in 359 Posts
    i've seen here discussions about lzhuf/lzari encoding. using my tornado experience, i may clear up some things...

    look at these results:
    Code:
    >tor -o  delphi 
    ari: 369773 -> 96636 kb (26.1%): 53.393 sec 
     
    >tor -o -c3 delphi 
    huf: 369773 -> 98126 kb (26.5%): 51.552 sec 
     
    >tor6++ -o -c3 delphi 
    huf: 369773 -> 96940 kb (26.2%): 59.697 sec 
    cnt+=1, huf: 2700/6 
     
     
     
    >tor -o  oo 
    ari: 368738 -> 84514 kb (22.9%): 43.322 sec 
     
    >tor -o -c3 oo 
    huf: 368738 -> 85501 kb (23.2%): 41.890 sec 
     
    >tor6++ -o -c3 oo 
    huf: 368738 -> 84756 kb (23.0%): 48.509 sec


    here, delphi represents binary and OO - textual data. first test for each file uses ARI encoding, second - current HUFFMAN coder, third - the same HUFFMAN coder with settings close to the settings of ARI coder. as you'll find, while my default HUFFMAN implementation is 1-1.5% worse than ARI, with the same settings difference is only 0.3%

    at the same time, HUFFMAN at these settings becomes very slow, spending a lot of time in tree rebuilding. my implementation is semi-dynamic and both ari and "improved huffman" rebuilds trees after every 10-20k elements decoded. don't forget that semi-dynamic approach is symmetric and requires to do the same tree rebuilding at decode stage too, where overall speed decrease may be 2x-3x (decomression time for both files is somewhat about 5-10s)

    this should answer your second question - is STATIC HUFFMAN compress better than DYNAMIC one? no, i think, it's just much faster at decoding stage and it is why it was used since pkzip 1.x

    but there are ways to spped up tree rebuilding, making this (very complex) static approach unnecessary. these methods was described by KADACH in his thesis. shortly speaking, we should
    1) use merging of sorted lists instead of heap manipulation when building huffman tree
    2) presort char freqs using sort<> instaed of qsort() - the later spends too much time calling comparision routine
    3) even better, use radix sort (afair, 2 "digits" is optimal) to make sort even faster

    this allows to build tree in O(n) time instead of O(n*log(n)). tornado already implements (1), otherwise even its current settings (rebuilding tree each 50k elements) will be probably too slow compared to ARI. implementation of all 3 improvements will allow to develop semi-dynamic LZHUF coder that's only 0.3% worse than LZARI implementations but decompress about 1.5x faster

    this remains true for all simple, ar002-derived coders (lzh, squueze, deflate, rar, ace, cabarc, tornado). only lzma-like beasts got real improvements by using arithmetic coders

    ps: see a bit more info in my post http://www.encode.su/forums/index.ph...ic=408#msg4265

  6. #6
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Thanks Bulat!

  7. #7
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,505
    Thanks
    741
    Thanked 665 Times in 359 Posts
    small correction: i've had a closer look into tornado sources and found that update frequencies are:
    ari: ~1400 (RANGE/6)
    huf: 50000
    improved huf: 2700

    this allows to surmise that STATIC huffman will be not better and may be even worse than my SEMI-ADAPTIVE approach. for texts, statistics changes rather slow and it's not much problem that i encode data using old stats. for binary data, stats changes rather fast. with deflate approach, we make new block for each ~8k elements and this value can't be decreased because we will need more bits for encodig the huffamn tables itself. my "improved huf" updates tables every 2700 elements and can do it even faster. this means that it has chances to better fit actual stats

    there is also one more improvement which closes the gap between HUF and ARI to 0.1% on binary data and almost non-existent on texts, but i don't yet sure neither it may be efficiently implemented nor that the same improvement may be efficiently implemented for ari too. but probably it will work, making semi-adaptive huffman almost ideal solution for lz77 backends

  8. #8
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quick (-12) test...

    A10.jpg > 847,244
    AcroRd32.exe > 1,582,796
    english.dic > 1,083,660
    FlashMX.pdf > 3,790,815
    FP.LOG > 958,896
    MSO97.DLL > 2,044,392
    ohs.doc > 840,116
    rafale.bmp > 1,185,646
    vcfiu.hlp > 735,520
    world95.txt > 665,261

    Total = 13,734,346 bytes

  9. #9
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Another quick (-1) test...

    Test machine: Intel PIII (Coppermine) @750 MHz, 512 MB RAM, Windows 2000 Pro SP4

    Test File: ENWIK9 (1,000,000,000 bytes)

    Timed with AcuTimer v1.2


    ENWIK9 > 531,348,504 bytes
    Elapsed Time: 00:01:55.203 (115.203 Seconds)

  10. #10
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,505
    Thanks
    741
    Thanked 665 Times in 359 Posts
    Quote Originally Posted by LovePimple
    Another quick (-1) test...
    both tests shows extremal points which are not impressive because ive optimized mainly for -5 mode, used in freearc. this mode should show good speed/compression ratio, especially on binary data

  11. #11
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quick (-5) test...

    A10.jpg > 847,148
    AcroRd32.exe > 1,618,547
    english.dic > 1,114,205
    FlashMX.pdf > 3,828,773
    FP.LOG > 1,250,907
    MSO97.DLL > 2,073,283
    ohs.doc > 862,412
    rafale.bmp > 1,330,265
    vcfiu.hlp > 802,245
    world95.txt > 782,159

    Total = 14,509,944 bytes

  12. #12
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Another quick (-5) test...

    Test machine: Intel PIII (Coppermine) @750 MHz, 512 MB RAM, Windows 2000 Pro SP4

    Test File: ENWIK9 (1,000,000,000 bytes)

    Timed with AcuTimer v1.2


    ENWIK9 > 300,731,066 bytes
    Elapsed Time: 00:05:29.812 (329.812 Seconds)

  13. #13
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    I tested with -11. -12 caused too much disk thrashing with only 2 GB

    http://cs.fit.edu/~mmahoney/compression/text.html#2594

Similar Threads

  1. FreeArc compression suite (4x4, Tornado, REP, Delta, Dict...)
    By Bulat Ziganshin in forum Data Compression
    Replies: 554
    Last Post: 26th September 2018, 03:41
  2. Some comments on Tornado
    By m^2 in forum Data Compression
    Replies: 14
    Last Post: 24th November 2008, 02:25
  3. Tornado 0.4
    By Bulat Ziganshin in forum Forum Archive
    Replies: 83
    Last Post: 15th April 2008, 13:45
  4. Tornado - fast lzari compressor
    By Bulat Ziganshin in forum Forum Archive
    Replies: 23
    Last Post: 27th July 2007, 14:26
  5. tornado 0.2 is not yet finished...
    By Bulat Ziganshin in forum Forum Archive
    Replies: 15
    Last Post: 12th July 2007, 00:06

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •