Page 1 of 5 123 ... LastLast
Results 1 to 30 of 130

Thread: HFCB: Huge Files Compression Benchmark

  1. #1
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts

    HFCB: Huge Files Compression Benchmark

    http://freearc.org/HFCB.aspx

    i plan to add tests on huge games (prototype, COD) later

    UPDATE 2015-01-06: i've finally found original VMDK files: http://freearc.org/download/testdata...04.1_PE_x86.7z (just concat bagluxpe-s00*.vmdk to produce vm.dll)
    Last edited by Bulat Ziganshin; 6th January 2015 at 15:44.

  2. #2
    Member Fu Siyuan's Avatar
    Join Date
    Apr 2009
    Location
    Mountain View, CA, US
    Posts
    176
    Thanks
    10
    Thanked 17 Times in 2 Posts
    Ha~! Can you give me results of my CSC3.1 with -m0/m1/m2 -d7?

  3. #3
    Tester
    Black_Fox's Avatar
    Join Date
    May 2008
    Location
    [CZE] Czechia
    Posts
    471
    Thanks
    26
    Thanked 9 Times in 8 Posts
    Since almost all utilities there are used in multi-threaded mode, how about pbzip2? (Win32 compile download)
    Original idea with the test BTW
    I am... Black_Fox... my discontinued benchmark
    "No one involved in computers would ever say that a certain amount of memory is enough for all time? I keep bumping into that silly quotation attributed to me that says 640K of memory is enough. There's never a citation; the quotation just floats like a rumor, repeated again and again." -- Bill Gates

  4. #4
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    Black_Fox, i can't download it

  5. #5
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    Quote Originally Posted by Fu Siyuan View Post
    Ha~! Can you give me results of my CSC3.1 with -m0/m1/m2 -d7?
    decompression cmdline?

  6. #6
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    566
    Thanks
    217
    Thanked 200 Times in 93 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    Black_Fox, i can't download it
    Does this link work? From there, use the "Win32-Binaries" link. Otherwise, I could attach it, it's only 100 KB in size.

    BTW, since it's an Ubuntu image that perhaps contains some GZip/BZip2 streams, did you try FreeArc modes (Precomp + srep) or other compressors together with Precomp (Precomp + srep + 7-Zip)? This could be very slooow in compression (especially if slow mode is used), but could also reduce the compressed size (at least most Linux distribution ISOs can be reduced to 50-70% with it, don't know if it works for VM images as well).
    Last edited by schnaader; 1st December 2009 at 21:53.
    http://schnaader.info
    Damn kids. They're all alike.

  7. #7
    Member Skymmer's Avatar
    Join Date
    Mar 2009
    Location
    Russia
    Posts
    681
    Thanks
    38
    Thanked 168 Times in 84 Posts
    Quote Originally Posted by schnaader View Post
    BTW, since it's an Ubuntu image that perhaps contains some GZip/BZip2 streams, did you try FreeArc modes (Precomp + srep) or other compressors together with Precomp (Precomp + srep + 7-Zip)? This could be very slooow in compression (especially if slow mode is used), but could also reduce the compressed size (at least most Linux distribution ISOs can be reduced to 50-70% with it, don't know if it works for VM images as well).
    I tried it but unfortunately with no luck. Both 0.3.8 and 0.4.0 are crashing.

    EDIT: I'm now trying to PreComp that VM splitted into 10 TAR volumes and also got crash on 4th volume. Also seems that there is not too much help because:
    Code:
    01.dat	424 673 280
    02.dat	424 673 280
    03.dat	424 673 280
    
    01.pcf	424 787 282
    02.pcf	438 558 203
    03.pcf	427 621 443
    Last edited by Skymmer; 1st December 2009 at 22:06.

  8. #8
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,375
    Thanks
    214
    Thanked 1,023 Times in 544 Posts
    wonder if 7-zip would be able to extract it as a disk image.
    otherwise, file data might not be in sequence there (due to clusters).

  9. #9
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    Quote Originally Posted by Skymmer View Post
    I tried it but unfortunately with no luck. Both 0.3.8 and 0.4.0 are crashing
    are you tried with -t-j? most times it crashes due to packjpg.dll

  10. #10
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    Quote Originally Posted by Shelwien View Post
    wonder if 7-zip would be able to extract it as a disk image.
    otherwise, file data might not be in sequence there (due to clusters).
    original: http://174.36.1.2/bagluxpe.7z

  11. #11
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    Quote Originally Posted by schnaader View Post
    Does this link work? From there, use the "Win32-Binaries" link. Otherwise, I could attach it, it's only 100 KB in size
    thanks, will test both pigz and pbzip2

    i don't yet tried precomp

  12. #12
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    566
    Thanks
    217
    Thanked 200 Times in 93 Posts
    Quote Originally Posted by Skymmer View Post
    I tried it but unfortunately with no luck. Both 0.3.8 and 0.4.0 are crashing.

    EDIT: I'm now trying to PreComp that VM splitted into 10 TAR volumes and also got crash on 4th volume. Also seems that there is not too much help because:
    Code:
    01.dat	424 673 280
    02.dat	424 673 280
    03.dat	424 673 280
    
    01.pcf	424 787 282
    02.pcf	438 558 203
    03.pcf	427 621 443
    Slow mode (for more matches) and -t-j (perhaps combined with -v, against the crashes) would be worth a (last) try. Ah, right, it's an open testset. Will make some experiments myself

    Quote Originally Posted by Shelwien View Post
    wonder if 7-zip would be able to extract it as a disk image.
    otherwise, file data might not be in sequence there (due to clusters).
    This would be a good idea, indeed. 7-Zip can open almost everything , so there should be a good chance. Another idea would be to check if there's special software (or a function inside the virtual machine) for converting the image file.
    http://schnaader.info
    Damn kids. They're all alike.

  13. #13
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    Ah, right, it's an open testset. Will make some experiments myself
    if you will do, please post results here so i can use best modes and know how much time it will need and how much compression is

  14. #14
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts

  15. #15
    Member Fu Siyuan's Avatar
    Join Date
    Apr 2009
    Location
    Mountain View, CA, US
    Posts
    176
    Thanks
    10
    Thanked 17 Times in 2 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    decompression cmdline?
    The same with compression.The header of file1 decides the behavior.

  16. #16
    Member Skymmer's Avatar
    Join Date
    Mar 2009
    Location
    Russia
    Posts
    681
    Thanks
    38
    Thanked 168 Times in 84 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    i plan to add tests on huge games (prototype, COD) later
    Bulat, if by COD you mean Call of Duty: Modern Warfare or Call of Duty: Modern Warfare 2 game then I can give you advice: don't waste your time. All content of these games is compressed. For example MW2 consist of: 1.96 GB of movies in BIK format, 4.73 GB of IWD files (actually ZIP files), 4.36 GB of FF files, which are ZLIB packed.

    EDIT: Also, how about adding PAQ8px to competitors?
    Last edited by Skymmer; 2nd December 2009 at 05:07.

  17. #17
    Member
    Join Date
    Jun 2008
    Location
    Berlin
    Posts
    12
    Thanks
    0
    Thanked 0 Times in 0 Posts
    console games should be okay to test

  18. #18
    Member
    Join Date
    May 2009
    Location
    France
    Posts
    98
    Thanks
    13
    Thanked 74 Times in 44 Posts
    Hello,

    'Seems you mixed testing time and extraction time in reporting decompression time for your short table. Or did I miss the point ?

    Interesting benchmark starting here, for me at least!


    AiZ

  19. #19
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    HFCB: added CSC and more FreeArc modes

    Seems you mixed testing time and extraction time in reporting decompression time for your short table
    decompression time = testing time. my HDD is slow, CPU is fast so extraction time has much more overhead than average system

    how about adding PAQ8px
    who will volunteer to test? i can do it myself later if it will finish overnight

    For example MW2 consist of: 1.96 GB of movies in BIK format, 4.73 GB of IWD files (actually ZIP files), 4.36 GB of FF files, which are ZLIB packed.
    one more reason to add stdin-to-stdout mode to precomp+freearc

  20. #20
    Member Fu Siyuan's Avatar
    Join Date
    Apr 2009
    Location
    Mountain View, CA, US
    Posts
    176
    Thanks
    10
    Thanked 17 Times in 2 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    HFCB: added CSC and more FreeArc modes

    Thanks the same though I originally mean "-m0/m1/m2" -d7

  21. #21
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    Results for vm
    zpaq ocmid.cfg (111 MB) -> 982439831, 11251 sec.
    zpaq ocmax.cfg,3 (1861 MB) -> 888920458, 27373 sec.
    Decompression not tested yet. Will test tonight.

    Test machine: Dual core T3200, 2.0 GHz, 3 GB, Vista 32 bit,
    ZPAQL compiled with MinGW g++ 4.4.0 -O2 -s -fomit-frame-pointer -march=pentiumpro -DNDEBUG -DOPT

    Since zpaq runs on 1 core, I ran both compression programs at the same time overnight. Times are wall times.

    EDIT: decompression OK. mid.cfg = 10097 sec, max.cfg,3 = 25952 sec, wall times, done one at a time.
    Last edited by Matt Mahoney; 4th December 2009 at 18:11.

  22. #22
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    HFCB: added more zip/bzip2 modes; 7-zip results updated, now these are much better since i've renamed vm to vm.dll

  23. #23
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    566
    Thanks
    217
    Thanked 200 Times in 93 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    HFCB: added more zip/bzip2 modes; 7-zip results updated, now these are much better since i've renamed vm to vm.dll
    Interesting, didn't know 7-Zip had some detection based on the file extension. Doesn't feel right though, it should better depend on the actual content than on filenames or extensions... How big is the difference?
    http://schnaader.info
    Damn kids. They're all alike.

  24. #24
    Member Skymmer's Avatar
    Join Date
    Mar 2009
    Location
    Russia
    Posts
    681
    Thanks
    38
    Thanked 168 Times in 84 Posts
    I have 806 MB result on HFCB (VM) I haven't measured the time of the whole process cause its anyway useless due the completely different CPU but proccess is asymmetric and requires only 700 MB on compression and 64 MB on decompression. But let's go by numbers.
    The idea of PreComp-ing that VM file hooked me completely. Turning off the JPEG recompression was not a solution for me and I even didn't try it. Furthemore most of the problems were related to ZLIB and GIFs. So my strategy was to find out all the problematic offsets and create the ignore list. After the whole night of testing I finally was managed to make PreComp 0.3.8 work without crashes. So the final command line looks like this:

    precomp -v -slow -i619716256 -i620119742 -i733687954 -i733280138 -i733841552 -i734911416 -i1212229222 -i1319302591 -i1319303624 -i1325620736 -i1623902430 -i1637846002 -i2231172608 vm

    Then SREP and 7z -mx=9

    Some stats:
    Code:
      Original: 4 244 176 896
     PreComped: 4 946 444 864
    After SREP: 3 391 459 732
      Final 7z:   845 818 619
    Some notes:
    - Results of Precomp 0.4.0 can be much better due recursion but I'm not going to repeat same search scheme with it. But you have a clue so free to try
    - Precomping took more than 2 hours on my AMD64 4000+. Noticeable slowdown on Multi-PNG files.
    - "Bad" offsets have been appearing very chaotically. For example, I found bad offset 734911416. In the next pass bad offset 733280138 appeared. Its a mystery for me why its not appeared in the previous pass.

    Bulat, if you gonna include Precomp based results then I'm really curious if the command line given above will work on your system.

  25. #25
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    566
    Thanks
    217
    Thanked 200 Times in 93 Posts
    Really nice results, good work

    I had some attempts, too and I managed to get a Precomp 0.4.1 development version to run without a crash (there's no big difference to 0.4, so this should work for it, too). Using -t-j crashed, but using both -t-j and -v seemed to work. But I recognized halfway through the file (which took very long, blame recursion, slow drives and still using the PC for other things, I guess) that the output drive I'm using is formatted using FAT32, so output would be corrupted as there's a 2 or 4 GB limit on this filesystem and output will surely be more than 4 GB.

    By the way, recursion really should help a bit, highest level I saw in the log file so far is level 2 which occurs quite often.

    I'm splitting the input file into 700 MB parts now and will try again.
    http://schnaader.info
    Damn kids. They're all alike.

  26. #26
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    Quote Originally Posted by schnaader View Post
    Interesting, didn't know 7-Zip had some detection based on the file extension. Doesn't feel right though, it should better depend on the actual content than on filenames or extensions... How big is the difference?
    Code:
    7-zip 9.07 [64]
     -mx                          987261165 1329.529 73.671 149.803
     -mx -md128m                  978210220 1446.626 72.579 151.746
     -mx -md256m                  971648644 1573.823 72.525 156.832
    7-zip 9.07 [64] (with BCJ)
     -mx                          960869320 1210.284 87.194 107.865
     -mx -md128m                  951249799 1313.346 86.421 105.929
     -mx -md256m                  945521722 1417.122 86.278 106.142

  27. #27
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    HFCB: updated results of rar, csc, nz

  28. #28
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Bulat...I see the benchmark. But where are the huge files?
    Seriously, I nowadays wouldn't call a file <1 TB "huge".

  29. #29
    Tester
    Black_Fox's Avatar
    Join Date
    May 2008
    Location
    [CZE] Czechia
    Posts
    471
    Thanks
    26
    Thanked 9 Times in 8 Posts
    Since "large" files are 100 MB - 1 GB and most used benchmark corpora weight in order of megabytes, then yes, this is "huge".
    I am... Black_Fox... my discontinued benchmark
    "No one involved in computers would ever say that a certain amount of memory is enough for all time? I keep bumping into that silly quotation attributed to me that says 640K of memory is enough. There's never a citation; the quotation just floats like a rumor, repeated again and again." -- Bill Gates

  30. #30
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by Black_Fox View Post
    Since "large" files are 100 MB - 1 GB and most used benchmark corpora weight in order of megabytes, then yes, this is "huge".
    "Huge" suggests something unusual.
    There's nothing unusual about 4 GB files. Everybody dealt with hundreds of them.
    And actually I'm surprised that you call a 100 MB file large. For me it's well within average. It seems our scales differ a lot.

Page 1 of 5 123 ... LastLast

Similar Threads

  1. convert swf files to avi files
    By Jabilo in forum The Off-Topic Lounge
    Replies: 13
    Last Post: 26th October 2016, 12:39
  2. New benchmark for generic compression
    By Matt Mahoney in forum Data Compression
    Replies: 20
    Last Post: 29th December 2008, 09:20
  3. MONSTER OF COMPRESSION - New Benchmark -
    By Nania Francesco in forum Forum Archive
    Replies: 222
    Last Post: 5th May 2008, 11:04
  4. Compression speed benchmark
    By Sportman in forum Forum Archive
    Replies: 104
    Last Post: 23rd April 2008, 17:38
  5. Synthetic compression benchmark
    By giorgiotani in forum Forum Archive
    Replies: 6
    Last Post: 3rd March 2008, 12:14

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •