Results 1 to 22 of 22

Thread: Compression Challenge

  1. #1
    Member Dimitri's Avatar
    Join Date
    Nov 2015
    Location
    Greece
    Posts
    48
    Thanks
    21
    Thanked 30 Times in 14 Posts

    Compression Challenge

    i have been seeing this community needs some fun, so i decided i put up a small challenge

    Download this file : 93.5 MB method of compression is rar best mode

    lets see if you can get the best of it

    btw Unpacked file measures up to 27 GB so, Good Luck to all who participate.




    https://mega.nz/#!Dp4WDagQ!z4EaN9hxKSbjYkQw6uMuJm1cVocHMxPsLArXGp8dVLA


    PS: Dont forget to add method you used along with photos or a txt file.


    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	Screenshot_2016-09-27_00-19-26.png 
Views:	207 
Size:	786.3 KB 
ID:	4659  

  2. Thanks:

    Nania Francesco (27th September 2016)

  3. #2
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    538
    Thanks
    238
    Thanked 92 Times in 72 Posts
    Hi! Sounds good! BTW: This is about ratio only? I mean, using Emma will get us likely the best results but, c'mon!! it's 27 gb!!
    What about some tradeoff between size and speed? You can name your preferences... Something like compressed_bytes/cpu_cycles*100 or like that...

  4. #3
    Member Dimitri's Avatar
    Join Date
    Nov 2015
    Location
    Greece
    Posts
    48
    Thanks
    21
    Thanked 30 Times in 14 Posts
    well you can experiment, you can use all you got on it.

    if you want post your config, ratio and speeds

    it did it for fun mostly since i chose to use rar of all compressors :P

  5. #4
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    538
    Thanks
    238
    Thanked 92 Times in 72 Posts
    Well... 1st "test": Decompression OK but method isn't rar's best. Is method 3 of 5.
    2nd: <<spoiler alert>> This file is a permutation of characters made using a program, just like other we've seen here before but extremely larger. So, it can be compressed to a few lines of code. Unfortunately, the resultant program will run till the end of the times . I bet NanoZip's delta filter would do a fine job here. See this: http://encode.su/threads/2050-Specif...a-in-a-pattern

  6. #5
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    1,026
    Thanks
    103
    Thanked 410 Times in 285 Posts
    Quote Originally Posted by Dimitri View Post
    lets see if you can get the best of it
    First tested fastest levels then best with slowest level:
    16,458,474,814 bytes, 22.371 sec. - 31.507 sec., lzturbo - 10 - 12 threads
    16,415,339,284 bytes, 59.291 sec. - 46.073 sec., LZ4 - 1 - 12 threads
    13,659,827,941 bytes, 36.092 sec. - xx.xxx sec., eXdupe - 1 - 12 threads
    13,395,278,257 bytes, 17.588 sec. - 21.333 sec., Qpress - 1 - 12 threads
    10,844,251,156 bytes, 277.921 sec. - 54.006 sec., WinZpaq - 1 - 12 threads
    7,943,956,035 bytes, 177.123 sec. - 83.359 sec., Zstandard - 1 - 1 thread
    5,734,915,232 bytes, 170.700 sec. - 116.748 sec., Brotli - 1 - 1 thread
    5,089,397,494 bytes, 13.978 sec. - 13.082 sec., NanoZip - f - 12 threads
    4,583,875,630 bytes, 259.501 sec. - 97.935 sec., WinRAR - 1 - 12 threads
    4,561,879,785 bytes, 61.691 sec. - 80.123 sec., FreeArc - 1 - 12 threads
    1,945,972,179 bytes, 145.051 sec. - 237.651 sec., 7-Zip - 1 - 12 threads
    775,275,606 bytes, 259.925 sec. - 137.871 sec., Bsc - 0 - 12 threads
    20,867,631 bytes, 1190.341 sec. - 1100.830 sec., ZCM - 0 - 8 threads
    15,152,068 bytes, 1694.876 sec. - 1269.016 sec., ZCM - 7 - 8 threads

  7. #6
    Member Dimitri's Avatar
    Join Date
    Nov 2015
    Location
    Greece
    Posts
    48
    Thanks
    21
    Thanked 30 Times in 14 Posts
    someone had fun

    i expect to see zstd with dictionary usage at some point

  8. #7
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    1,026
    Thanks
    103
    Thanked 410 Times in 285 Posts
    1,841,275 bytes, 40,026 sec., EMMA Text (Slow).

  9. #8
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    278
    Thanks
    116
    Thanked 160 Times in 117 Posts
    Quote Originally Posted by Sportman View Post
    1,841,275 bytes, 40,026 sec., EMMA Text (Slow).
    1.841.275 would be nice, but I guess that EMMA compresses file with size >= 4 GiB in a wrong way, and 1.841.275 is the compressed size of the last part of the file.
    To check this, I tested EMMA on the first 5.000.000.000 bytes (5GB) of super-wpalist.txt and the decompressed size was 705.032.704 (705.032.704 = 5.000.000.000 - 4.294.967.296 (4GiB)), not 5.000.000.000.
    If 1.841.275 is the compressed size of 28.645.082.578 % 4.294.967.296 = 2.875.278.802, then 1.841.275 / 2.875.278.802 * 28.645.082.578 = 18.343.777, thus the real compressed size of super-wpalist.txt should be about 18.4000.000 (I feel it can be some MB smaller).

  10. Thanks (2):

    schnaader (30th September 2016),Sportman (30th September 2016)

  11. #9
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    546
    Thanks
    203
    Thanked 796 Times in 322 Posts
    I honestly can't believe anyone would actually try to compress a 27GB file with EMMA
    EMMA stores the original filesize in 32 bits, so as Mauro correctly pointed out, it can't handle such large files.
    I never though it would need more than that, since it is really slow and I don't want anyone to think that it is even close to being stable and start using it "for real". I could switch to using 64-bits for filesizes, but I don't see the point in it, it might be better to simply add a check to ensure it doesn't try to compress such large files.

  12. #10
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    1,026
    Thanks
    103
    Thanked 410 Times in 285 Posts
    Quote Originally Posted by mpais View Post
    I honestly can't believe anyone would actually try to compress a 27GB file with EMMA
    I and others compress even much bigger files, but not with EMMA

    Good Mauro pointed out this 4GB limit, I never decompressed the challenge file and compared it.

  13. #11
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    876
    Thanks
    242
    Thanked 324 Times in 197 Posts
    Quote Originally Posted by Dimitri View Post
    someone had fun

    i expect to see zstd with dictionary usage at some point
    Small static dictionaries are not relevant for long files.

  14. #12
    Member Dimitri's Avatar
    Join Date
    Nov 2015
    Location
    Greece
    Posts
    48
    Thanks
    21
    Thanked 30 Times in 14 Posts
    sounds like ZCM is taking the glory here

    and expected 7z to do some more !!!

  15. #13
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    278
    Thanks
    116
    Thanked 160 Times in 117 Posts
    I've tested CMV and EMMA on some blocks of 100.000.000 bytes.
    Code:
    Starting from 0 (the hardest block to encode IMHO, lines change from 1 to 2, 2 to 3, 3 to 4 characters):
     7.686 CMV
    35.714 EMMA
    Starting from 400.000.000 (the second hardest block to encode IMHO, lines change from 4 to 5 characters):
     5.746 CMV
    36.043 EMMA
    Starting from 1.000.000.000 (one block > 500.000.000, compression ratio is stable and one block is similar to others (*)):
     5.075 CMV (this is the only file I checked the decompression and it was ok)
    35.480 EMMA
    
    EMMA shows a stable compression ratio of 0.04%, CMV 0.0044-0.0045%.
    CMV is quite good on this kind of files (useless in the real life :-( ), e.g.:
    LOG.txt and NUM.txt (Specific case - High redundant data in a pattern)
    19.897 LOG.cmv (-m0,0,0x036c29fe)
       726 NUM.cmv (-m1,2,0x01e1d9fa)
    
    I guess that CMV and EMMA compress super-wpalist.txt in about:
    CMV   7686 +  5500 * 3 +  5746 +  4500 * 281,45 =  ~1.296.457 bytes
    EMMA 35714 + 35000 * 3 + 36043 + 35000 * 281,45 = ~10.027.507 bytes
    
    For CMV I used the same best options found for NUM.txt: -m1,2,0x01e1d9fa (maybe on super-wpalist.txt it can be not the best).
    Options for EMMA are attached in the post (it seems to be better than "Text (Slow)", it can be not the best).
    
    (*) I don't see the point to compress ~27 GB when it seems to be enough the first 1 or 2 GB.
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	EMMA-super-wpalist.txt.png 
Views:	119 
Size:	31.9 KB 
ID:	4678  

  16. Thanks:

    RamiroCruzo (2nd October 2016)

  17. #14
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    538
    Thanks
    238
    Thanked 92 Times in 72 Posts
    i have been seeing this community needs some fun
    This is actually pretty funny to me:
    Code:
    List.rar______98.046.036 (Original ~27gb compressed and distributed)
    List.rar.bcm__26.516.650 (27%)
    List.rar.xz___32.441.644 (33%)
    In fact:
    Code:
    98.046.036 List.rar
    58.874.379 List.rar.gz
    54.476.308 List.rar.deflate64
    51.896.601 List.rar.bz2
    48.114.457 List.rar.bee
    41.856.463 List.rar.bssc2
    40.625.384 List.rar.quad
    38.020.901 List.rar.balz
    36.768.304 List.rar.bma
    36.225.587 List.rar.bssc
    35.337.255 List.rar.rolz3
    34.282.812 List.rar.ccmx
    32.914.462 List.rar.rzm
    32.898.711 List.rar.sqx
    32.441.644 List.rar.xz
    32.303.899 List.rar.lz
    30.143.315 List.rar.w
    29.604.342 List.rar.bit
    27.853.135 List.rar.bliz
    27.399.450 List.rar.nz
    27.391.076 List.rar.bbb
    26.516.650 List.rar.bcm
    More to come...
    Hint if anybody tries the same: Block sorting algorithms are the best for this particular file, and by far. Not so sliding window family. Also, don't even think about bwmonstr... That slow!!! Really, much slower than paq*. And the ratio bad too.
    Last edited by Gonzalo; 3rd October 2016 at 07:09.

  18. #15
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    223
    Thanked 146 Times in 83 Posts
    C:\>mcm -m8 g:\super-wpalist.txt c:\out
    ================================================== ====================
    mcm compressor v0.83, by Mathieu Chartier (c)2015 Google Inc.
    Experimental, may contain bugs. Contact mathieu.a.chartier@gmail.com
    Special thanks to: Matt Mahoney, Stephan Busch, Christopher Mattern.
    ================================================== ====================
    Compressing to c:\out mode=mid mem=8
    Analyzing
    27973713KB , 35939KB/s
    text : 1(26.6778GB)
    Analyzing took 778.395s

    Compressed metadata 15 -> 20

    Compressing text stream size=28,645,082,578
    Constructed dict words=32+8960+411480=420472 save=296460+12729910+5292202=183185
    72 extra=0 time=1.538s
    Dictionary words=420472 size=1.90441MB
    27973713KB -> 1793982KB 1972KB/s ratio: 0.06413
    Compressed 28,645,082,578 -> 1,837,037,641 in 14180.43100s

    Compressed 28,645,082,578->1,837,037,675 in 14961.418s bpc=0.513
    Avg rate: 522.303 ns/B
    MCM 0.83 compression results! Really hard this test!
    Last edited by Nania Francesco; 3rd October 2016 at 13:39.

  19. #16
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,976
    Thanks
    296
    Thanked 1,303 Times in 740 Posts
    Can somebody test this? :
    Code:
    7z a -mf=off -mx=9 -mmt=1 -m0=lzma2 -m1=lzma2 archive file
    or maybe
    Code:
    7z a -mf=off -mx=9 -mmt=1 -m0=lzma2:d=3G -m1=lzma2 archive file

  20. #17
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    1,026
    Thanks
    103
    Thanked 410 Times in 285 Posts
    Quote Originally Posted by Dimitri View Post
    sounds like ZCM is taking the glory here
    Not:
    14,025,138 bytes, 11282.83 sec. - 11593.314 sec., WinZpaq - 9 - 12 threads

  21. #18
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    1,026
    Thanks
    103
    Thanked 410 Times in 285 Posts
    Quote Originally Posted by Sportman View Post
    Not:
    12,474,476 bytes, 25800.xxx sec. - 24375.938 sec., NanoZip - c - 12 threads

    Other tests:
    13,437,838,339 bytes, 57.584 sec. - 24.793 sec., Qpress - 3 - 12 threads
    2,329,731,083 bytes, 302.882 sec. - x.xxx sec., eXdupe - 3 - 12 threads
    584,672,811 bytes, 2600.053 sec. - x.xxx sec., 7-Zip - 9 - 12 threads
    263,940,776 bytes, 8132.926 sec. - 178.794 sec., FreeArc - 9 - 12 threads
    149,945,734 bytes, 310.399 sec. - 102.444 sec., WinRAR - 5 - 12 threads

  22. Thanks:

    Razor12911 (7th October 2016)

  23. #19
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    1,026
    Thanks
    103
    Thanked 410 Times in 285 Posts
    542,719,694 bytes, 50118.002 sec. - 72.052 sec., Brotli - 11 - 1 thread

  24. Thanks:

    Razor12911 (7th October 2016)

  25. #20
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    1,026
    Thanks
    103
    Thanked 410 Times in 285 Posts
    1,061,358,329 bytes, 27920.552 sec. - 132.046 sec., Zstandard - 22 - 1 thread

  26. Thanks:

    Razor12911 (7th October 2016)

  27. #21
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    1,026
    Thanks
    103
    Thanked 410 Times in 285 Posts
    1,694,640,566 bytes, 26086.628 sec. - 284.601 sec., Bzip2 - 9 - 1 thread

  28. Thanks:

    Razor12911 (7th October 2016)

  29. #22
    Member Dimitri's Avatar
    Join Date
    Nov 2015
    Location
    Greece
    Posts
    48
    Thanks
    21
    Thanked 30 Times in 14 Posts
    Quote Originally Posted by Shelwien View Post
    Can somebody test this? :
    Code:
    7z a -mf=off -mx=9 -mmt=1 -m0=lzma2 -m1=lzma2 archive file
    or maybe
    Code:
    7z a -mf=off -mx=9 -mmt=1 -m0=lzma2:d=3G -m1=lzma2 archive file


    Gotta try that out

    really love lzma2 !!

Similar Threads

  1. Calgary Compression Challenge is closing on May 20
    By leob in forum Data Compression
    Replies: 6
    Last Post: 20th May 2016, 09:18
  2. Image Compression Challenge at PCS 2015
    By thorfdbg in forum Data Compression
    Replies: 0
    Last Post: 8th February 2015, 14:20
  3. Replies: 1
    Last Post: 3rd July 2014, 06:31
  4. WebM and HEVC compression challenge at PCS 2013
    By thorfdbg in forum Data Compression
    Replies: 0
    Last Post: 11th February 2013, 15:34
  5. Calgary compression challenge
    By Matt Mahoney in forum Data Compression
    Replies: 3
    Last Post: 25th March 2012, 12:18

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •