Page 1 of 2 12 LastLast
Results 1 to 30 of 39

Thread: TC 5.0dev11 is here - Time to gain compression!

  1. #1
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    What's new:
    + Increased memory usage from 24 MB to 56 MB as a result higher compression
    + Moving towards PPM - now PPM is used more frequently, as a result improved compression

    Enjoy!

    Link:
    Download TC 5.0dev11 (31 KB)


  2. #2
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    TC 5.0dev11 on Large Text Compression Benchmark:

    ENWIK8: 27,293,396 bytes
    ENWIK9: 242,199,762 bytes (c 446 sec, d 393 sec)

    Memory usage: 56 MB

    P4 3.0 GHz, 1 GB RAM, Windows XP SP2


  3. #3
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Just a few notes about results. As you can see, compression on ENWIK9 was improved, but not too much. Like I said, I'm moving towards PPM - i.e. now TC uses LZP only if it can provide more efficient compresison than PPM and on each step encoder checks it - that's why compression speed is affected, but decompression speed is about the same. But such trick not works on this XML file, as a result poorer compression (about 3 MB, or 0.2% loss) but in most cases it works, though. Since I'm writing compressor not only for ENWIK9 I'll keep this feature.


  4. #4
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    TC 5.0dev11 on Calgary Corpus:

    bib: 28,236 bytes
    book1: 242,799 bytes
    book2: 164,353 bytes
    geo: 62,258 bytes
    news: 116,447 bytes
    obj1: 10,462 bytes
    obj2: 75,290 bytes
    paper1: 16,774 bytes
    paper2: 25,823 bytes
    pic: 52,816 bytes
    progc: 12,601 bytes
    progl: 14,915 bytes
    progp: 10,468 bytes
    trans: 16,398 bytes

    total: 849,640 bytes, 2.1635 bpb


  5. #5
    Member
    Join Date
    May 2006
    Location
    Uruguay
    Posts
    30
    Thanks
    0
    Thanked 1 Time in 1 Post
    AbiWord Source Code 110750KB
    TC10 22190KB Process Time = 51.578
    TC11 21696KB Process Time = 52.687

  6. #6
    Guest
    It has now overtaken LZPXj on the large text benchmark!

  7. #7
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    It's only a beginning. The TC file compressor is the flagship of all LZP coders.

    In next versions I'll add:
    + Special filters for: EXE, BMP, WAV, and other files.
    + Special filters with auto-detection and data analysis. These filters are: Multi-Media filter (for pictures, tables, audio files, etc.) and X86 filter for executable files (windows and other x86 code).

    Some of these filters was developed specially for PIMPLE, some of them currently not used. Note with these filters TC simply can outperform LZPXJ and others.


  8. #8
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    TC 5.0dev11 on SFC:

    A10.jpg: 854,284 bytes
    acrord32.exe: 1,693,659 bytes
    english.dic: 921,281 bytes
    FlashMX.pdf: 3,758,309 bytes
    fp.log: 620,291 bytes
    mso97.dll: 2,049,029 bytes
    ohs.doc: 852,817 bytes
    rafale.bmp: 949,447 bytes
    vcfiu.hlp: 701,322 bytes
    world95.txt: 579,991 bytes

    total: 12,980,430 bytes


  9. #9
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    TC 5.0dev11 on Large Canterbury Corpus:

    bible.txt: 880,731 bytes
    E.coli: 1,148,138 bytes
    world192.txt: 491,094 bytes


  10. #10
    Guest
    TC simply can outperform LZPXJ and others.

    I really hope it will outperform most other compressors!

  11. #11
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    New TC's home:
    http://www.encode.su/tc/

  12. #12
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Fun facts (a post at M Software's forums, Anonymous is me ):

    Posted: Sun Nov 20, 2005 1:36 pm Post subject:

    --------------------------------------------------------------------------------


    Anonymous wrote:
    Now, to significantly improve compression performance of LZPX I must encode literals with PPM.


    OK, a few warning. Once you go beyond order-1 or order-2 speed will be significantly affected because matches will no longer be the most efficient way to encode most data.

    PPM also has issues with memory management. If you use a PPMD (Dmitry's PPMD vI for example) type model (a prefix tree structure IIRC), then you have the memory usage grow linear in the file size. It is not easy to bound this without ruining compression (PPMD just restarts the model).
    The other downside is that that model requires sequential characters, Because it relies on the prefix nature of the input to allow the context lookup to be O(1). This makes it hard to use for a hybrid algorithm since you would have to update the PPM model for each character within a match, and hence slowing down and making worse compression.

    OK, so if you want to use PPM models, the best idea is probably to use a hash table similar to PAQ. The downside is that this will get slow beyond order-3 or order-4. This will give you a bounded model size, although may not be terribly efficient in terms of number of contexts for a given model size.

    I would actually reccommend looking into a PAQ encoder. If you use a PAQ6 style encoder, with only a few models, you will find it may be possible to better PPM in terms of both compression and speed. The latest FPW algorithms are a testament to this, and are capable of speed and compression better than the best PPM we have.

    Anonymous wrote:
    What escape strategy must be used? – PPMC or maybe I must use full blending. Also, I think a lower order PPM must be used (max order – order-3 or order-2). And, I guess, for access to statistics, hash table must be used (Like in PAQ1). And what set of models is the best: order-3-1-0, order-3-0, order-2-0, ...?


    Each thing that you add will slow compression down. Blending, SEE, SSE etc. Having said that, each of those will also provide significant improvements in compression.
    Note that PAQ can outdo a PPM implementation with all the bells and whistles, in terms of speed, and sometimes in compression.

    Malcolm

    -------------------------------------------------- ------------------------------

    ...that means, I exactly implement what Malcolm suggested, since it's the best solution. I think Malcolm Taylor is the #1 man in data compression. So, in fact, current TC is just LZPX 2.0...

  13. #13
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Performance on Textures.tar - 2,135,764,480 bytes

    Textures from Unreal Tournament 2004 game

    TC 5.0dev11: 900,786,497 bytes
    RAR 3.51, -m5: 920,545,874 bytes
    BZIP2 1.0.2, --best: 930,756,313 bytes
    PKZIP 2.50, -maximum: 1,005,723,761 bytes
    Original: 2,135,764,480 bytes

    Also note, only PKZIP was faster than TC. Other compressors was incomparable slower than TC.


  14. #14
    Guest
    In next versions I'll add:
    + Special filters for: EXE, BMP, WAV, and other files.
    + Special filters with auto-detection and data analysis. These filters are: Multi-Media filter (for pictures, tables, audio files, etc.) and X86 filter for executable files (windows and other x86 code).


    The next versions are going to be amazing!


  15. #15
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    The vector of future improvements is completely depends on tetsing results (MFC and others) - since first of all the pure engine must achieve some compression strength with reasonable speed. I think TC's main engine must achieve 66-67% or more on tarred SFC. And only when, ensuring the engine's power we can think about filters. But for my self, I'm already comparing results with and without filters. At least, due to my calculations, TC 5.0dev11 beats WinACE on SFC. And don't forget, WinACE have lots of multimedia/executable filters. Also I think we must be careful with filters. For example, as with PIMPLE, the BMP-filter hurts compression on 'rafale.bmp'! Also I have additional ideas for future improvements:
    + Add support for multiple-files(/folders). This feature really makes TC like complete file archiver. Also I think I make non-solid file archives support. First, this is much simpler to implement, and secondly, this can even gain compression - since here we use the PPM algorithm and in most cases solid mode just hurts.
    + I know a few thricks how to improve compression without affecting decompression speed. Note, current 5.0dev11 version already have higher decompression speed than compression.
    + And finally, I have idea about to create a GUI file archiver as a replacement of PIMPLE - currently the main program of the encode.su. The main goal of this program (as of TC) is fast compression (faster than WinRAR and 7-Zip) with good compression (higher than ZIP and, at least, WinACE). Also I think this program must be simpler than PIMPLE - PIMPLE have a lots of filters, analysis stages and options - and as a result it's really hard to manage such big project. So, at some point any filters must be avoided.

  16. #16
    Guest
    + Add support for multiple-files(/folders). This feature really makes TC like complete file archiver. Also I think I make non-solid file archives support. First, this is much simpler to implement, and secondly, this can even gain compression - since here we use the PPM algorithm and in most cases solid mode just hurts.

    That's fantastic news!

  17. #17
    Guest
    Note, current 5.0dev11 version already have higher decompression speed than compression.

    What's the reason for this statement? I would expect quite the same speed for compression and decompression (because baseline is LZP+PPM). Also some simple tests support this expectation. Actually in most cases the decompression is slower by a small margin.

  18. #18
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    What's the reason for this statement? I would expect quite the same speed for compression and decompression (because baseline is LZP+PPM). Also some simple tests support this expectation. Actually in most cases the decompression is slower by a small margin.

    That's true only since 5.0dev11 - look a the 'Large Text Compression Benchmark'. It's due to the encoder have extra task - it checks, is current match can be efficiently coded or not. Decoder haven't this stage. But of course, the PPM-based compressors have slower decompression speed compared to compression (about 5-10%). So TC is extra case. And again, speed is significantly depends on data! For example, speed of TC is a few times slower on non-redundand data than on common data, etc. So, in some cases decompression speed can be really slower.

  19. #19
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Also check out the 'File Collector' - it's the base for the future archiver.

    Download the 'File Collector' (22 KB)

    Usage:

    to add all files from current folder to 1.fc archive type:

    fcol a 1.fc *.*

    to add all files from c:|windows folder to d:|win.fc type:

    cd c:|windows

    fcol a d:|win.fc *.*

    - i.e. you cannot use c:|windows|*.* wildcard, instead you must change current folder to folder containing files

    you can add files to archive multiple times. example:

    cd test1

    fcol a c:|1.fc *.*

    cd ..

    cd test2

    fcol a c:|1.fc *.*

    - the file 1.fc will contain all files from 'test1' and 'test2' folders

    the archive CAN contain files with the same name! So, be careful.

    To extract:

    fcol x 1.fc

    - all files form '1.tc' will be extracted to current folder

    To extract all files to 'test' folder type:

    cd test

    fcol x c:|1.fc

    NOTE: fcol.exe must be copied to system folder (c:|windows) or folder containing fcol.exe must be listed in %PATH% system variable! - So, fcol.exe can be accessed from any path.


  20. #20
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Different archive modes and results:

    SFC files

    TC 5.0dev12-normal mode: 12,980,543 bytes
    TC 5.0dev12-solid mode: 13,314,929 bytes
    TC 5.0dev12+TAR: 13,323,316 bytes

  21. #21
    Guest
    I think that TC will take the title that 7-Zip used to have.

    "Archiver with highest compression ratio"

    Remember?

  22. #22
    Guest
    I think that TC will take the title that 7-Zip used to have.
    "Archiver with highest compression ratio


    I don't think that's Ilia's intention. 7-Zips compression ratios are much better in most cases and additionally WinRK has often even better ratios but is even slower too.

    Speed/compression trade-off is a completely different story. Providing good compression ratio while keeping speed at usuable level should be the goal. For this TC is on a good way ...

  23. #23
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    I think I must create a new archiver, keeping TC untouched. I already add the multiple-files feature to new 'nameless' archiver, and it's works! Currently I use archives with solid mode, since it works faster - no need to reset the entire model for each file. But, now you cannot add files to existing archives, also you cannot view archive contents. New archiver will have CABARC-like interface.

    to create an archive:
    noname n 1.arc *

    to extract all files:
    noname x 1.arc

    etc.

    Any name suggestions are welcomed!


  24. #24
    Guest
    Any name suggestions are welcomed!

    Will give it some thought!

  25. #25
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    expected command-line interface of the new archiver:

    D:>arc n 1.arc *
    arc DEMO (c) 2006 ilia muraviev
    creating new archive '1.arc':
    ■ adding 1160.206...
    ■ adding ABOUTMPC.TXT...
    ■ adding Aryx.s3z...
    ■ adding LICENSE.TXT...
    ■ adding MPTRACK.CNT...
    ■ adding MPTRACK.EXE...
    ■ adding mptrack.GID...
    ■ adding MPTRACK.HLP...
    ■ adding mptrack.ini...
    ■ adding MPTRACK.lan...
    done

  26. #26
    Guest
    Looks good!

  27. #27
    Guest
    I think I must create a new archiver, keeping TC untouched. I already add the multiple-files feature to new 'nameless' archiver, and it's works!
    Any name suggestions are welcomed!


    Of course, it is your decision. But I do not understand your development strategy. Why do you need a new program? Why not adding multiple-files feature to your existing codec(s)?

    The only reason I can image to create a new program is that you want to make finally something useful, i.e. something not considered as experimental codec. I would like to see PIMPLE/LZPX/TC compression supported by one archiver with all the common archiver features (such as multiple-files, SFX, ...), so it can cover a wide range of compression needs. Is that what you wanna do?

  28. #28
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Yes, I want to write a complete archiver without 'experimental' stamp.

    About the name:

    + TC is not the extra original name. For example, if you type in Google 'TC' or 'TC compressor', you'll wind anyting but not TC compressor. Furthermore, this name have some collisions under Linux/Unix OS - since there is already 'tc' command exists.

    + LZPX is open source program and think I will not make new archiver also open source. The LZPXJ is something like LZPX by Jan, and currently maintained by Jan Ondrus only, if at all. Current TC engine is derived from PIMPLE and have completely different structure compared to LZPX/LZPXJ. Anyway, current LZPXJ utilizes the same ideas as TC.

    + PIMPLE. The development of this program is stopped, so formerly it's not an 'experimantal' codec. In addition, I want to keep this software untouced, since it have high compression ratio and a GUI.


  29. #29
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Good news!
    Finally, I implement a sliding window! So new archiver uses the full power of solid archiving! I implement this since I've found, restarting the LZP model (not PPM) for each file completely ruins compression. Also this gives some compression gain on large files.


  30. #30
    Guest
    Yes, I want to write a complete archiver without 'experimental' stamp.

    About the name:

    + TC is not the extra original name. For example, if you type in Google 'TC' or 'TC compressor', you'll wind anyting but not TC compressor. Furthermore, this name have some collisions under Linux/Unix OS - since there is already 'tc' command exists.

    + LZPX is open source program and think I will not make new archiver also open source. The LZPXJ is something like LZPX by Jan, and currently maintained by Jan Ondrus only, if at all. Current TC engine is derived from PIMPLE and have completely different structure compared to LZPX/LZPXJ. Anyway, current LZPXJ utilizes the same ideas as TC.

    + PIMPLE. The development of this program is stopped, so formerly it's not an 'experimantal' codec. In addition, I want to keep this software untouced, since it have high compression ratio and a GUI.


    What about the name IMArc?

    Your initials + Arc for Archiver. I got the idea from Phil Katz + Zip = PKZip.

Page 1 of 2 12 LastLast

Similar Threads

  1. C++ compile-time constant detection
    By Shelwien in forum The Off-Topic Lounge
    Replies: 5
    Last Post: 5th August 2010, 09:11
  2. Better compression performance across time?
    By Trixter in forum Data Compression
    Replies: 16
    Last Post: 17th June 2008, 00:35

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •