Page 1 of 2 12 LastLast
Results 1 to 30 of 32

Thread: TC 5.2dev1 overview

  1. #1
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Today, I start writing a new engine. It will be something like an improved TC 5.1dev1...dev3. Actually, I write this engine for the new archiver or as a new algorithm for existing PIM. Main features:
    + Fast decompression (faster than TC 5.1dev1)
    + Low memory usage: ~25 MB or less
    Anyway, it will be something like a modern Deflate algorithm – fast and memory efficient. In addition, it will have three compression modes:
    + Fast (Greedy Parsing) (TC 5.1dev1 uses such parsing)
    + Normal (Lazy Evaluation) (TC 5.1dev2 and later uses such parsing)
    + Max (Flexible Parsing/Optimal Parsing)
    That means TC becomes more LZ than PPM/CM encoder. What about the compression. I will try to maximize it as it possible, but not in cost of the decompression speed. I hope I can get more compression from Optimal Parsing and better LZ layer. In addition, of that, new TC will have the slightly improved Small PPM as with TC 5.1dev1...5.1dev3. As you see, I move backwards to the ROLZ2-like algorithm, instead of ROLZ3. If you ask me, how about the super-compression, I can say, any moves to that runs to the PAQ or PAQ-like engines. Moreover, no reason to re-invent the wheel and write own PAQ-like engine. Now I am looking to the LZ77 coders. For example, look at LZMA – it has high compression and at the same time extremely fast decompression, of course, Deflate is faster but if you look at the compression power... One of the reasons why 7-Zip (LZMA) does so well is the Optimal Parsing. If I adopt something similar to my TC, it will be very nice. As Malcolm Tailor said, his ROLZ2/ROLZ3 is also gets the most power from the Optimal Parsing. So anyways, let's wait and see...


  2. #2
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    As long as PIM's compression eventually beats or matches that of WinRAR on most files; I will be happy.

  3. #3
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    These days I play with a few different parsing schemes, including:
    + Enhanced Lazy Evaluation (or Lazy Matching)
    + Flexible Parsing
    The best one is the Flexible Parsing (FP), this solution is near optimal. Note, in case with fixed cost of a Literal/Match coding the FP is optimal, but here we have a PPM/Adaptive entropy coding... Anyway, actually the best one is the Dynamic Programming Approach (used by 7-Zip, ROLZ) but today I have no idea how to integrate such parsing scheme into my TC. The Dynamic Programming Approach can cope with adaptive encodings (i.e. along with a few steps lookahead, it looks to the real literal and match cost and only when makes a decision). Okay, let us back to the Flexible Parsing. According to my experiments, the FP gives some very nice gain on almost all files, especially on:
    + LOG files - extreme compression gain
    + TXT files – a really noticeable gain
    + BINARY files – a moderate or small gain, but this is completely depends on files
    To be honest, until now, I did not knew (or believe) that a better parsing can give a serious compression gain, just some approximate digits from the top of my head:

    World95.txt
    Greedy Parsing - ~660 KB
    Lazy Evaluation - ~628 KB
    Flexible Parsing - ~600 KB

    Fp.log
    Greedy Parsing - >800 KB
    Lazy Evaluation - ~760 KB
    Flexible Parsing - ~650 KB

    This is digits from the new TC with fast decompression speed (> 16 MB/sec) and a small memory footprint (~24 MB). (Prototype of this version, witch I am planning to release at 1 January, by the way (like a New Year gift)) However, due to the advanced parsing, the compression speed is really affected – compression is a few times slower, compared to the TC 5.1dev2, but since the decompression speed is NOT affected or even decompression becomes faster (more matches = more speed) additionally to such compression mode can be optional (in GUI version. CL version of TC will have one mode; at least currently, I am planning so)

    Hm... Well, very soon, after I make more experiments and write a final version of the new Flexible Parser, here I will post some exact results...


  4. #4
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Some interesting results with TC 5.1dev6. I replace the current Lazy Evaluation to the new Flexible Parsing. Note the new parser is more efficient with TC 5.2devX (fast decompression and more LZ than CM or something).

    Performance on SFC (Experimental TC 5.1dev7)
    A10.jpg: 830,453 bytes
    acrord32.exe: 1,310,504 bytes
    english.dic: 835,070 bytes
    FlashMX.pdf: 3,696,298 bytes
    fp.log: 589,787 bytes
    mso97.dll: 1,715,014 bytes
    ohs.doc: 785,648 bytes
    rafale.bmp: 976,844 bytes
    vcfiu.hlp: 605,676 bytes
    world95.txt: 583,026 bytes

    Total: 11,928,320 bytes


  5. #5
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    As a proof, you can download this file:

    fp.tc (575 KB)

    This file is packed with the experimental TC 5.1dev7 version, witch is completely compatible with TC 5.1dev6. So, you can downlaod this file, unpack it with dev6 and verify its contents...


  6. #6
    Guest
    Hi Ilia,

    Stephan Busch here.
    I have finally built my own website. Feel free to have a look in
    a free minute

    http://squeezechart.meine-hp.net/index.htm

    Yours,

    Stephan Busch

  7. #7
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by encode
    As a proof, you can download this file:

    fp.tc (575 KB)

    This file is packed with the experimental TC 5.1dev7 version, witch is completely compatible with TC 5.1dev6. So, you can downlaod this file, unpack it with dev6 and verify its contents..
    Yes, its very impressive!

  8. #8
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Hi Stephan!

    I hope you receive my email (from my new email address). But, currently I think PDF was better, probably you need a better design. For some inspiration in webdesign, you can look at this excellent site and blog engine:

    wordpress.org

  9. #9
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Okay, and here are some testing results of the TC 5.2dev1 alpha non-public release. It uses ROLZ2 like algorithm and grabs just 24 MB of memory + fast decompresison. Also note, it has no filters - just the pure engine.

    Results
    A10.jpg: 852,941 bytes
    acrord32.exe: 1,705,490 bytes
    english.dic: 849,879 bytes
    FlashMX.pdf: 3,802,230 bytes
    fp.log: 670,015 bytes
    mso97.dll: 2,071,794 bytes
    ohs.doc: 854,590 bytes
    rafale.bmp: 1,034,419 bytes
    vcfiu.hlp: 702,247 bytes
    world95.txt: 616,500 bytes

    Total: 13,160,105 bytes

    This one uses the Flexible Parsing. Note I'm still working on better parsing, so probably this is the 'Normal' mode:
    1. Fast - Lazy Evaluation
    2. Normal - Flexible Parsing
    3. Max - Dynamic Programming


  10. #10
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts


    After these results was posted, in one minute I invent one improvement witch give some additional compression gain in all cases...



    So, I'm not finish...

  11. #11
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Do you think the next release of TC will compress better than PIM 1.25 on the SFC test?

  12. #12
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Keep in mind:
    PIM uses ~56 MB and has EXE and Multimedia filters.
    TC 5.1dev6 uses ~144 MB and has EXE filter with auto-detection (can detect x86 code inside any file, including TAR archive, and preprocess it).
    TC 5.2dev1 uses ~24 MB and has no filters. (i.e. with EXE filter and slightly better parsing it can outperform PIM, even with such memory footprint)

    The goal of the new TC is really fast decompression -- much faster than with PIM and incomparable faster than with TC 5.1dev6. To do this, I will try to keep decoder as simple as it possible. Additionally to that, I want to keep a good compresison. Note, with current Flexible Parsing TC's compression speed is really affected. The worst case is the highly redundant data such as 'fp.log' -- compression can be X20 or more times slower, but also on these files we can see a real compresison gain. Anyway, I'll keep in mind only:
    1. decompression speed
    2. memory usage
    So, the compression time is really doesn't matter and I'll try to get the max compression keeping decoder untouched.


  13. #13
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    OK!

  14. #14
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    It's funny, but 7-Zip 4.42 with Ultra compresison compresses the fp.log to 838,823 bytes...

  15. #15
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by encode
    Its funny, but 7-Zip 4.42 with Ultra compresison compresses the fp.log to 838,823 bytes...
    Just another nice feature of this forum... Very handy!

  16. #16
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Latest results:
    A10.jpg: 852,942 bytes
    acrord32.exe: 1,702,298 bytes
    english.dic: 831,212 bytes
    FlashMX.pdf: 3,801,443 bytes
    fp.log: 653,521 bytes
    mso97.dll: 2,068,462 bytes
    ohs.doc: 854,257 bytes
    rafale.bmp: 1,034,901 bytes
    vcfiu.hlp: 701,259 bytes
    world95.txt: 615,821 bytes

    Total: 13,116,116 bytes
    This version uses slightly improved Flexible Parsing, and also memory usage was increased to 32 MB -- to get the more efficient memory usage. The next will be an improved EXE-filter, plus, finally a Dynamic Programming Approach witch can give very serious compression gain...


  17. #17
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by encode
    Its funny, but 7-Zip 4.42 with Ultra compresison compresses the fp.log to 838,823 bytes...
    Thats a disgraceful perfomance from 7-Zip!

  18. #18
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by encode
    Just another nice feature of this forum... Very handy!
    Yes, much better than the text quote!

  19. #19
    Guest
    If it is possible, please add some kind picture verification, like seen on
    wc3campaigns.net/revolution/forum/viewtopic.php?t= 10058

  20. #20
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by Black_Fox
    If it is possible, please add some kind picture verification, like seen on
    wc3campaigns.net/revolution/forum/viewtopic.php?t= 10058
    Actually, miniBB has a "Human Authorization (CAPTCHA)" plugin, but this is a paid plugin. Probably, it is possible to find a free one or write this feature manually, but unfortunately I have no spare time and prefer to improve the TC at the free minute. Anyway, if someone can find a free plugin/solution how to add such authorization, I will very grateful!


  21. #21
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    By the way, all high-performance LZ-based archivers uses the Flexible parsing as a base for "Optimal Parsing". Such parsing scheme has at least one parameter - the depth of searching.

    In 7-Zip this parameter is called:
    Word size

    In WinRK:
    Largest optimised match

    Currently, TC uses the max possible "Largest optimized match" to provide the highest compression possible. If we cutdown this value, we can get more speed in cost of compression ratio. So anyways, I'm still optimizing and improving...


  22. #22
    Guest
    There is no free captcha plugin for miniBB AFAIK...
    If somebody (or you, encode) has some time and basic php experience, you could see this site: hxxp://www.white-hat-web-design.co.uk/articles/php -captcha.php
    Seems quite easy to implement, although I don't know how complex may be to integrate it into minibb

  23. #23
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by encode
    Currently, TC uses the max possible "Largest optimized match" to provide the highest compression possible. If we cutdown this value, we can get more speed in cost of compression ratio.
    I dont like the idea of gaining speed at the cost of compression ratio!

  24. #24
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by LovePimple
    I dont like the idea of gaining speed at the cost of compression ratio!
    Well, it is just about the future Normal mode of the new engine. For example, with Longest Optimized Match of eight, in the most cases, TC provides about the same compression as with max value, at the same time being a few times faster in average and in some cases like with fp.log speed can be about X10-X20 times faster. However, the max Longest Optimized Match should provide the best compression.

    Note, I am still experimenting with optimal parsing – for example, I have found a new tiny improvement. Anyway, currently I have no idea for the future improvements witch can give significant compression gain. Therefore, I start experimenting with the ROLZ output coding. For example, PIM achieves such good compression on text files due to an Order-3-1-0 PPM. The structures of these programs (PIM and current TC) are quite similar. Furthermore, new TC is just some sort of improved PIM. In fact, I can simply add the stronger PPM to the TC. It is funny, but some time ago, the PPMC algorithm was the strongest and the slowest one... Note, PIM uses inefficient hash table in terms of speed, but in terms of compression ratio and as a hybrid algorithm part, this one is the best. However, the speed is really affected only if we go beyond the order-3 or so. That means, an Order-2-1-0 PPM must have noticeable higher speed. Therefore, I will try to add an Order-2-1-0 PPM to the TC and look what this benefits. Such stronger PPM additionally needs 2 MB of memory (currently TC uses ~32 MB, plus 2 = ~34 MB, still okay) Actually, this variant of PPM is the best for binary data – more higher orders provides less compression on binary data, and at the same time higher compression on text files. IMHO, small PAQ encoder, like used with 5.1dev4-dev6 is too heavy for fast decompression... Well, watch for updates!


  25. #25
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    TC 5.2dev1 with order-2 literal coder:
    A10.jpg: 859,572 bytes
    acrord32.exe: 1,673,411 bytes
    english.dic: 878,790 bytes
    FlashMX.pdf: 3,785,047 bytes
    fp.log: 627,515 bytes
    mso97.dll: 2,023,010 bytes
    ohs.doc: 851,008 bytes
    rafale.bmp: 1,032,676 bytes
    vcfiu.hlp: 687,292 bytes
    world95.txt: 596,886 bytes

    Total: 13,015,207 bytes
    A tiny compression gain... At the same time decompression speed was REALLY affected. Anyway, I also have an idea about some compromise between a classical PPM and a CM encoder...


  26. #26
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Looks like this new engine will be awesome!

  27. #27
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Traditionally, a new improvement = new posted results. Here I back to the original order-1 PPM. This one uses some interesting heuristic - it computes the cost of the match encoding and compares it to the match length. Thus, at some point better compress a shorter match with small offset value instead of a larger one with large offset.

    A10.jpg: 852,943 bytes
    acrord32.exe: 1,700,524 bytes
    english.dic: 826,494 bytes
    FlashMX.pdf: 3,800,388 bytes
    fp.log: 646,450 bytes
    mso97.dll: 2,066,913 bytes
    ohs.doc: 852,796 bytes
    rafale.bmp: 1,033,252 bytes
    vcfiu.hlp: 695,117 bytes
    world95.txt: 612,248 bytes

    Total: 13,087,125 bytes
    Well, looks like on some files performance is really good - TC ready to outperform WinAce even with no filters. Note, at Squeeze Chart TC 5.1dev3 easily outperforms WinAce, TC 5.2dev1 uses less memory and at the same time has noticeable higher compression, I think the goal of the "pure" engine is to outperform WinRAR at SqueezeChart. But, I will not hurry up, I will try to carefully play with these Price and adaptive encodings stuff.


  28. #28
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Excellent!

  29. #29
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    TC on ENWIK8 and ENWIK9:
    ENWIK8: 29,883,293 bytes
    ENWIK9: 262,724,249 bytes

  30. #30
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Not as good as "TC 5.0 dev 11" or "LZPXj 1.1d" on these particular files.


    TC 5.0 dev 11

    ENWIK8: 27,293,396

    ENWIK9: 242,199,762


    LZPXj 1.1d

    ENWIK8: 28,386,512

    ENWIK9: 246,468,866

Page 1 of 2 12 LastLast

Similar Threads

  1. LZPM 1.00 overview
    By encode in forum Forum Archive
    Replies: 21
    Last Post: 6th June 2007, 02:11
  2. lzpm 0.03 overview
    By encode in forum Forum Archive
    Replies: 3
    Last Post: 28th April 2007, 22:16
  3. lzpm overview
    By encode in forum Forum Archive
    Replies: 4
    Last Post: 14th April 2007, 23:30
  4. TC 5.2dev1 is here! This is CRAZY!
    By encode in forum Forum Archive
    Replies: 14
    Last Post: 6th February 2007, 03:49
  5. TC 5.1dev6 overview
    By encode in forum Forum Archive
    Replies: 1
    Last Post: 19th November 2006, 04:29

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •