Page 1 of 2 12 LastLast
Results 1 to 30 of 37

Thread: TANGELO - new compressor (derived from PAQ8/FP8)

  1. #1
    Programmer Jan Ondrus's Avatar
    Join Date
    Sep 2008
    Location
    Rychnov nad Kněžnou, Czech Republic
    Posts
    278
    Thanks
    33
    Thanked 137 Times in 49 Posts

    TANGELO - new compressor (derived from PAQ8/FP8)

    TANGELO is single file compressor (not archiver) derived from PAQ8/FP8 licensed under GPL.

    I removed a lot of stuff from FP8 to make it as simple as possible so it has small source code and it is easier to understand how its core works (i think). Compression engine should be still same as the one in FP8.
    Specialized models/transformations for EXE / Images / Audio / JPEG / ... are all removed. You can't pack multiple files with TANGELO. You can't select memory it uses about 550/600mb (same as FP8 with option -7).
    It source is about 23kb (compared to 149kb for FP. It should have similar performace as FP8 on text and unknown/default data.

    Code:
    Usage: TANGELO <command> <infile> <outfile>
    
    <Commands>
     c       Compress
     d       Decompress
    Attached Files Attached Files

  2. The Following 9 Users Say Thank You to Jan Ondrus For This Useful Post:

    Bulat Ziganshin (17th June 2013),encode (18th June 2013),Mat Chartier (17th June 2013),Matt Mahoney (17th June 2013),Mike (19th June 2013),Nania Francesco (25th June 2013),samsat1024 (7th July 2013),Skymmer (20th June 2013),Stephan Busch (18th June 2013)

  3. #2
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Updated Silesia benchmark. http://mattmahoney.net/dc/silesia.html
    Compared to fp8_v3 -7, compression is better on structured text (nci and webster) but worse on x86 (ooffice, I guess because no e8e9 filter).

    LTCB will have to run overnight.

    Edit: LTCB updated. http://mattmahoney.net/dc/text.html#1532
    Speed is about the same as fp8_v3 -8 (5.5 hours to compress or decompress enwik9) but compression is a bit worse due to using only half as much memory.
    Last edited by Matt Mahoney; 18th June 2013 at 16:28.

  4. The Following User Says Thank You to Matt Mahoney For This Useful Post:

    Jan Ondrus (18th June 2013)

  5. #3
    Member Alexander Rhatushnyak's Avatar
    Join Date
    Oct 2007
    Location
    Canada
    Posts
    232
    Thanks
    38
    Thanked 80 Times in 43 Posts
    drt|tangelo would probably compress enwik8 and/or enwik9 tighter than drt|lpaq9m, while using almost 3 times less memory.

    This newsgroup is dedicated to image compression:
    http://linkedin.com/groups/Image-Compression-3363256

  6. #4
    Member
    Join Date
    Jun 2013
    Location
    Canada
    Posts
    36
    Thanks
    24
    Thanked 47 Times in 14 Posts
    Ran it with DRT on enwik8, enwik9:

    enwik8: drt|tangelo 17681785 bytes in 809.17s
    enwik9: drt|tangelo 148758265 bytes in 8153.09s

    Decompression not verified. Computer: Core i7 2630QM - 8 GB ram

    Very nice compression!

  7. #5
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Some results with drt + various compressors (as of June 2010). http://mattmahoney.net/dc/text.html#1440

    lpaq9m is tuned for drt output on enwik8/9.

  8. The Following User Says Thank You to Matt Mahoney For This Useful Post:

    Nania Francesco (25th June 2013)

  9. #6
    Member Alexander Rhatushnyak's Avatar
    Join Date
    Oct 2007
    Location
    Canada
    Posts
    232
    Thanks
    38
    Thanked 80 Times in 43 Posts
    Quote Originally Posted by Matt Mahoney View Post
    lpaq9m is tuned for drt output on enwik8/9.
    It doesn't look like it's heavily tuned. Slightly more than the following two are:

    Compressor ... ratio (dic+drt compressed size divided by enwik8 compressed size)
    paq8px_v67 ... 0.9480
    paq8l ... 0.9483
    ...
    lpaq9m ... 0.9478

    (from the last table in http://mattmahoney.net/dc/text.html#1440 )
    Last edited by Alexander Rhatushnyak; 26th June 2013 at 00:44.

    This newsgroup is dedicated to image compression:
    http://linkedin.com/groups/Image-Compression-3363256

  10. #7
    Programmer Jan Ondrus's Avatar
    Join Date
    Sep 2008
    Location
    Rychnov nad Kněžnou, Czech Republic
    Posts
    278
    Thanks
    33
    Thanked 137 Times in 49 Posts

    version 2.0

    TANGELO 2.0
    - removed APMs
    - removed some modeling (simpler model)
    - more simple StateMap and ContextMap
    - removed DMC Model
    - using less memory and faster
    - less compression
    - state table from Mat Chartier from this thread http://encode.su/threads/1742-Improv...state-machines
    Attached Files Attached Files

  11. The Following 3 Users Say Thank You to Jan Ondrus For This Useful Post:

    Matt Mahoney (7th July 2013),Nania Francesco (6th July 2013),samsat1024 (7th July 2013)

  12. #8
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    Thanks Jan for the great job you're doing but I think you can enter you as the creator of the program which is very different from Paq8 although similar. I think you should put a LZP to make it as fast as Paq9 (you could decrease the contexts and take only the most significant ones). You need create an archiver (Sami Runsas had put online one free) and how would you improve the solid 10-20%. I would like to work with you, Matt and Mat Chartier for a super archiver!

    WCC2013 results are excellent!

    Best Regards, Francesco!

  13. #9
    Programmer Jan Ondrus's Avatar
    Join Date
    Sep 2008
    Location
    Rychnov nad Kněžnou, Czech Republic
    Posts
    278
    Thanks
    33
    Thanked 137 Times in 49 Posts
    Quote Originally Posted by Nania Francesco View Post
    Thanks Jan for the great job you're doing but I think you can enter you as the creator of the program which is very different from Paq8 although similar. I think you should put a LZP to make it as fast as Paq9 (you could decrease the contexts and take only the most significant ones). You need create an archiver (Sami Runsas had put online one free) and how would you improve the solid 10-20%. I would like to work with you, Matt and Mat Chartier for a super archiver!

    WCC2013 results are excellent!

    Best Regards, Francesco!
    I don't think i will have time for developing new program. But i have one idea i want to experiment - use static huffman coding before modeling and context mixing to improve speed on redundant data (less bits will be modeled mixed and coded for each byte on average). It would be somewhat similar to how huffman coded data in paq8 JPEG model are handled. Did someone of you guys tried something like that? What do you think?

  14. #10
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    I think the answer from you alone. It seems to speak of Kung-fu. If you can't beat the enemy, becomes his friend. Simply copy the data with medium probability as in CSC 3.2! With Huffman would you do a big hole in the water!

  15. #11
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    A different alphabet decomposition (mapping of symbols from a N-ary alphabet to a set of prefix codes) certainly is a good idea. I've implemented order-1 Huffman decomposition (256 Huffman trees, one per order-1 context) in my old M1 and M1x2 compressors, see my homepage and check the most recent version. There was a speedup of roughly 30 - 50% and compression remained almost the same. I guess in your case it'll be bigger, since i mixed at most four models. However the way you group symbols of same Huffman code length has some influence on compression. I use a heuristic called "Huffman-III decomposition": http://www.sps.ele.tue.nl/members/f....CTW/Ben99x.pdf.

    Hope this helps.
    Cheers
    M1, CMM and other resources - http://sites.google.com/site/toffer86/ or toffer.tk

  16. #12
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts

  17. #13
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Quote Originally Posted by Jan Ondrus View Post
    I don't think i will have time for developing new program. But i have one idea i want to experiment - use static huffman coding before modeling and context mixing to improve speed on redundant data (less bits will be modeled mixed and coded for each byte on average). It would be somewhat similar to how huffman coded data in paq8 JPEG model are handled. Did someone of you guys tried something like that? What do you think?
    Depends on what you want to achieve. If you are doing backups, then the most important speed optimizations are detecting already compressed data (to store) and deduplication. This is because on typical disks, most of the data is already compressed and there tends to be a lot of space wasted in extra copies of files. The next best tricks are e8e9 transform (because x86 and x86-64 is a common uncompressed type) and grouping small files that have the same extension to compress together. Most compressible data is binary rather than text, so it is useful to have sparse models and fixed record size models, and you don't need a lot of memory. (An exception is DNA). Most benchmarks are not realistic in this sense, that they tend to have large text files, no duplication, and exclude already compressed files.

    Edit: updated enwik9. http://mattmahoney.net/dc/text.html#1532
    Last edited by Matt Mahoney; 8th July 2013 at 22:42.

  18. #14
    Programmer Jan Ondrus's Avatar
    Join Date
    Sep 2008
    Location
    Rychnov nad Kněžnou, Czech Republic
    Posts
    278
    Thanks
    33
    Thanked 137 Times in 49 Posts

    version 2.1

    Here is version TANGELO 2.1.
    It is faster with weaker compression again.

    changes:
    - one mixer used for one bit (is selected from 256 possible mixers by previous byte as context)
    - removed modeling except match,order0,1,2,3,4,6
    - higher order models (2,3,4,6) should be disabled for random-looking (already compressed) data for better speed
    - probabilities for states are now fixed (StateMap class replaced by array of probabilities)
    Attached Files Attached Files

  19. The Following User Says Thank You to Jan Ondrus For This Useful Post:

    Mat Chartier (22nd July 2013)

  20. #15
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts

  21. #16
    Member mahessel's Avatar
    Join Date
    Apr 2010
    Location
    Netherlands
    Posts
    9
    Thanks
    0
    Thanked 2 Times in 2 Posts

    tiny change for x64 platforms

    Made a tiny change, now it works on x64 platforms (and it is able to reserve more than 2Gb memory), without harming the initial setup.
    Attached Files Attached Files

  22. The Following User Says Thank You to mahessel For This Useful Post:

    Jan Ondrus (21st July 2013)

  23. #17
    Programmer Jan Ondrus's Avatar
    Join Date
    Sep 2008
    Location
    Rychnov nad Kněžnou, Czech Republic
    Posts
    278
    Thanks
    33
    Thanked 137 Times in 49 Posts

    version 2.3

    TANGELO 2.3

    - (re)added simple APM for better compression
    - some small changes for better speed
    Attached Files Attached Files

  24. #18
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts

  25. #19
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    873
    Thanks
    460
    Thanked 175 Times in 85 Posts
    this tangelo 2.3 compile keeps crashing on my system - which version of libstdc++-6.dll is needed?

    ah.. fixed..
    I had version from 21.09.2011
    it works with version from 16.10.2012
    Last edited by Stephan Busch; 25th July 2013 at 01:32.

  26. #20
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    I forgot to mention that tangelo.exe did not run because it was looking for some cygwin DLL files. I recompiled it from source for the test. The problem could be fixed by compiling with -static.

  27. #21
    Member mahessel's Avatar
    Join Date
    Apr 2010
    Location
    Netherlands
    Posts
    9
    Thanks
    0
    Thanked 2 Times in 2 Posts
    @Jan Ondrus, are you sure the construction around 'bytes_read' and 'bytes_written' is working properly? With 'enwik8' variable 'rn' does not change ...

  28. #22
    Member
    Join Date
    Jun 2013
    Location
    Canada
    Posts
    36
    Thanks
    24
    Thanked 47 Times in 14 Posts
    @mahessel, rn should only become one if tangelo detects that the input is compressing poorly. It is expected that rn should never change on highly compressible files such as text/xml.

  29. #23
    Programmer Jan Ondrus's Avatar
    Join Date
    Sep 2008
    Location
    Rychnov nad Kněžnou, Czech Republic
    Posts
    278
    Thanks
    33
    Thanked 137 Times in 49 Posts
    Quote Originally Posted by Mat Chartier View Post
    @mahessel, rn should only become one if tangelo detects that the input is compressing poorly. It is expected that rn should never change on highly compressible files such as text/xml.
    Yes, exactly.

  30. #24
    Member mahessel's Avatar
    Join Date
    Apr 2010
    Location
    Netherlands
    Posts
    9
    Thanks
    0
    Thanked 2 Times in 2 Posts
    Okay, clear.

    Changing the squash table into '0,2,6,11,20,33,52,82,126,193,290,430,626,888,1222 ,1616,2048,2479,2873,3207,3469,3665,3805,3902,3969 ,4013,4043,4062,4075,4084,4089,4093,4095' will improve the compression with about 30KB on enwik9

  31. #25
    Programmer Jan Ondrus's Avatar
    Join Date
    Sep 2008
    Location
    Rychnov nad Kněžnou, Czech Republic
    Posts
    278
    Thanks
    33
    Thanked 137 Times in 49 Posts
    Quote Originally Posted by mahessel View Post
    Okay, clear.

    Changing the squash table into '0,2,6,11,20,33,52,82,126,193,290,430,626,888,1222 ,1616,2048,2479,2873,3207,3469,3665,3805,3902,3969 ,4013,4043,4062,4075,4084,4089,4093,4095' will improve the compression with about 30KB on enwik9
    Do you know why?

  32. #26
    Member mahessel's Avatar
    Join Date
    Apr 2010
    Location
    Netherlands
    Posts
    9
    Thanks
    0
    Thanked 2 Times in 2 Posts
    First, the curve should start with 0 and end with 4095 (just like the limiter in squash).
    Second, the curve flow is a choice. In this case a less steeper slope preforms better.
    For example, you can tweak the curve by using:

    double t[4096];
    for (int n = 0; n < 4096; ++n) {
    t[n] = 1.0 / (1.0 + exp ((2048.0 - n) / TWEAKME));
    }
    const double offset = t[0];
    const double scale = 4095.0 / (t[4095] - offset);
    for (int n = 0; n < 4096; ++n) {
    t[n] = (t[n] - offset) * scale;
    table[n] = round (t[n]);
    }

    And change TWEAKME into 300 as value, the 'default' squash curve is about 150.

  33. #27
    Programmer Jan Ondrus's Avatar
    Join Date
    Sep 2008
    Location
    Rychnov nad Kněžnou, Czech Republic
    Posts
    278
    Thanks
    33
    Thanked 137 Times in 49 Posts

    version 2.4

    TANGELO 2.4

    - added fast JPEG model based on model from paq8fthis_fast.cpp (http://cs.fit.edu/~mmahoney/compression/paq8fthis4.zip)
    - this version is without APM
    Attached Files Attached Files

  34. The Following User Says Thank You to Jan Ondrus For This Useful Post:

    Nania Francesco (18th August 2013)

  35. #28
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    410
    Thanks
    37
    Thanked 60 Times in 37 Posts
    vista sp2 32bit:

    - missing files: libstdc++-6.dll , libgcc_s_dw2-1.dll
    - after downloading these 2 dll and
    copying the files to the tangelo.exe it works ..

    first look: it seems to be
    a little bit slowly on my core2duo
    but has a good compression
    better than zpaq in my test - needs more testing ...

    best regards
    Attached Files Attached Files
    Last edited by joerg; 19th August 2013 at 05:05.

  36. The Following User Says Thank You to joerg For This Useful Post:

    Samuraikarte (19th June 2014)

  37. #29
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    tangelo results on Silesia
    Code:
      Silesia dicke mozil   mr   nci ooff  osdb reym samba  sao webst x-ray  xml Compressor -options
    --------- ----- ----- ---- ----- ---- ----- ---- ----- ---- ----- ----- ---- -------------------
     37809279  2078 11228 2133  1029 1826  2168  867  2889 4283  5376  3636  291 tangelo 1.0
     41267068  2246 12479 2229  1320 2051  2330  978  3116 4478  5999  3716  321 tangelo 2.0
     44037765  2279 13895 2227  1580 2301  2449 1038  3298 4524  6306  3778  358 tangelo 2.3
     44847833  2299 14109 2283  1635 2328  2574 1050  3337 4653  6356  3846  371 tangelo 2.1
     44862127  2299 14121 2284  1631 2328  2575 1050  3343 4654  6356  3846  371 tangelo 2.4

  38. #30
    Member
    Join Date
    Jun 2013
    Location
    USA
    Posts
    98
    Thanks
    4
    Thanked 14 Times in 12 Posts
    I did a half-baked attempt at getting TANGELO to support standard input and output. Seems to work so far. Currently this is version 2.4.

    https://github.com/neheb/TANGELO

    edit: I should also mention that the recommended compiler switches are probably suboptimal. -O3 -msse2 seems to work best for me.

Page 1 of 2 12 LastLast

Similar Threads

  1. FP8 (= Fast PAQ8)
    By Jan Ondrus in forum Data Compression
    Replies: 65
    Last Post: 1st April 2019, 10:05
  2. How to use fp8???
    By nelsontky in forum Data Compression
    Replies: 9
    Last Post: 11th February 2016, 15:39
  3. deflate model for paq8?
    By kaitz in forum Data Compression
    Replies: 2
    Last Post: 6th February 2009, 20:48
  4. PAQ8 tests
    By kaitz in forum Forum Archive
    Replies: 4
    Last Post: 17th January 2008, 14:03
  5. PeaZip v1.3 now with PAQ8 support!
    By LovePimple in forum Forum Archive
    Replies: 29
    Last Post: 9th February 2007, 15:58

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •