Page 1 of 2 12 LastLast
Results 1 to 30 of 32

Thread: Start of another BIG and really real Benchmark

  1. #1
    Member
    Join Date
    Oct 2007
    Location
    Germany, Hamburg
    Posts
    408
    Thanks
    0
    Thanked 5 Times in 5 Posts
    Hi guys, I follow the 'compression scene' for some years. The time will come where I will start my own tool, but for now I want to start a benchmark which has nearly all datatypes.

    The facts:
    -1,35gb sized
    - files from many programs, games and personal data, so it won?t be public
    - many unknown formats, pictures, exe/dll files...

    Because this is a practical test programs based on paq are out of the fight! lpaq, ccm are in, but they are close the line.
    I hope for some reactions where I maybe have to care of, because I can change the testset until I start testing.

    Same is for the text, but the the most important for me is to set the focus on the decompression speed. I will try to find the best ration in speed and file size. Compression is also a value but it shouldn?t rate too much. If paq would decompress as fast as rar it would be ranked first (only that you see what I mean).
    If you have ideas how to find a ranking number post here

    EDIT:
    What I forgot: I have a powerful system, but a system with only 2,4ghz (*4 ) so multithreading compression tools are a big step in front.
    Q6600 @ 2400mhz
    4GB Ram
    Geforce 8800 GTS (For documentation)

    I want to limit the memory usage to 64, 128 or 256mb (not yet decided) because you need the same amount for decompression and you cannot use >256mb for a file for everyone.

  2. #2
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    744
    Thanked 668 Times in 361 Posts
    i suggest using formula from MFC test, but replace compression time with smth. more appropriate. also, his formula is "10% of compression is worth 2x more time" hich favors rather fast compressors (such as sbc and rar). i think that "3x=10%" or even "4x=10%" will be more appropriate from POV of most users which read these tests

    instead of compression time i suggest to use "compression time + 3*decompresion time". this will not over-favor fastest decompressors (i don't see any practical difference between rar and 7zip) while still assigns larger weight to more frequent operation

    about data compressed:
    1. no MM data. otherwise, level of MM data compression becomes most important ranging factor. you may see it in MFC test - it contains only 10% of MM data but adding of MM compression improves overall compression ratio by 8% (!!!) which is close to difference between rar and 7-zip, for example
    2. data should not be "unknown". most time users compress very well known data types such as exe and html. great test should include all widespread types of data in the right proportion
    3. one posiible solution is to make several datasets such as "executable files", htmls, C++ sources and so on and provide compression results for each category separately plus overall compression size/times/efficiency fields like it done in Squeeze Chart
    4. preferably, test should be done on 2-core processor, ideally core2. this processor is most usual for modern systems
    5. as Uwe said, please add columns which mention main archiver features:
    - compressor/archiver/archiver that supports archive updates
    - availability of 64-bit and linux versions, how many processors supported
    - GUI, AES encryption, multi-volumes, SFX/installer, data recovery

    6. ideally, it should be possible to sort table by times/size/efficiency, change coefficients of efficiency formula, enable/disable use of separate datasets in efficiency calculation
    7. test various compression modes (not only highest one), like it's done in MFC. as a rule of thumb, if two modes are differ 1.5x in speed, then both should be added to the table

    it's a minimum requirements to successful candidate for a name of "MONSTER OF COMPRESSION TESTS"

  3. #3
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    744
    Thanked 668 Times in 361 Posts
    Quote Originally Posted by Simon Berger
    I want to limit the memory usage to 64, 128 or 256mb (not yet decided) because you need the same amount for decompression and you cannot use >256mb for a file for everyone.
    you may limit decompression memory to these values. dont forget that 7-zip, for example, uses for compression 10x more memory than for compression

    imho, decompression memory should be limited to 200 megs. less amount is too small for modern archivers. and we test middle-level compressors (not paq nor zip) so we should ask for ability to decompress on medium-sized systems

  4. #4
    Member
    Join Date
    Oct 2007
    Location
    Germany, Hamburg
    Posts
    408
    Thanks
    0
    Thanked 5 Times in 5 Posts
    @Bulat
    I?m with you at most points.
    I plan to do a dynamic site (in the future) with a history of all tests (all versions), and I also list if it is a real archiver and if it has imnportant options.

    < you may limit decompression memory to these values. don't forget
    < that 7-zip, for example, uses for compression 10x more memory
    < than for compression

    Sure if it is so I would use the value for decompression.

    < Questions

    I have multi media files: jpg, mp3, avi... but not many of them. I make only this set because more of them is too much work to do.
    There are many known files, but every archiver should handle binary unkown files and they shouldn?t show such a big difference.
    Because it?s a benchmark for real use I surely won?t ever use the highest compression level. Maybe sometimes I will test more.
    Maybe I will include more html and pdf files.

  5. #5
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    744
    Thanked 668 Times in 361 Posts
    there is "quote" link at the beginning of each message. you can mouse-select text you want to quote and press this button

    Quote Originally Posted by Simon Berger
    I have multi media files: jpg, mp3, avi...
    jpg, mp3, pdf, zip, docx and other formats that compressed at least by some programs are ok. but you should include small amount of data of each type and overall amount of uncompressible data should be no more that 10-20%. ortherwise, you will mainly test how archivers handle uncompressible data and thats not the same as testing compression. avi and other formats that cant be compressed by anything is not of much interest

    Quote Originally Posted by Simon Berger
    There are many known files, but every archiver should handle binary unkown files and they shouldn?t show such a big difference.
    compressing unusual formats isnt the same as compressing many usual formats. in the former case some accidental aspect of this file may favor one compressor what dont have good compression of real data. so its better to provide many various standard formats rather that compress something unusual

    Quote Originally Posted by Simon Berger
    Maybe I will include more html and pdf files.
    htmls, sources, natural language texts is a must. such data types are often compressed in home/business environment. also doc/docx, xls, mdb/mailbase and so on. as you see, there are many known and well-spread datatypes. testing of files that you just found at hand on occasion is bad idea

    Quote Originally Posted by Simon Berger
    Because it?s a benchmark for real use I surely won?t ever use the highest compression level. Maybe sometimes I will test more.
    freearc, for example, includes 5 compression levels - from one competing with zip and targeted to fast backups, to one competing with ccm. i dont see reasons to not test them all

    Quote Originally Posted by Simon Berger
    I plan to do a dynamic site (in the future) with a history of all tests (all versions),
    yes, its one more important feature - by default display only results for latest archiver versions but allow to enable displaying all versions. unfortunately, all exisiting tests are static and dont allow user to select what he want to view

  6. #6
    Member
    Join Date
    Oct 2007
    Location
    Germany, Hamburg
    Posts
    408
    Thanks
    0
    Thanked 5 Times in 5 Posts
    I think I will start choosing the files again, it should also not be over 800mb file size. I will think more about wich filetypes I use. But I keep unknown files. I will use many types, so there are no real accidents.

    Maybe you could write your opinion about percentages of file types.
    20% text files, 20% unknown files, 10% executable files (dll,exe), 20% mm files (bmp, jpg, wav, mp3, wmv...). I find it hard to think in such numbers. I have some microsoft file formats visual studio, msdn... which are half text and binary files. Some zip engines and jar files which seems to compress not bad.
    I will show results of 7zip and UhaRc of the first test Testset tomorrow so you can make yourself a picture.

  7. #7
    Member
    Join Date
    Dec 2006
    Posts
    611
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by Bulat Ziganshin
    all exisiting tests are static
    Wrong.

  8. #8
    Tester

    Join Date
    May 2008
    Location
    St-Petersburg, Russia
    Posts
    182
    Thanks
    3
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Black_Fox
    Quoting: Bulat Ziganshinall exisiting tests are staticWrong.
    Thanx Black_Fox!!! Your site is very interesting!

  9. #9
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    744
    Thanked 668 Times in 361 Posts
    first - which archivers/modes to test:

    lprepaq, lpaq7, ccm, ccmx
    durilca -t1
    durilca'light with and without -t1 switch
    uharc -mz -md32768, -m2 -md32768, -mx -md32768
    7-zip -mx1, -mx3, -mx5, -mx9 -md128m, -mx9/ppmd with manual selection, bzip2 (4-threaded)
    freearc -m1..-m5, -m3x, -m4x, -m8x
    rar -m1 -md32768, -m3 -md32768, -m5 -md32768, -m5 -md32768 -mc14:128t
    sbc -m1 -b5, -m2 -b15, -m3 -b63
    squeez -s -m5 -au1 -fme1 -fmm1 -ppm1 -ppmm48 -ppmo10 -rgb1 -uxx1
    winrk best asymmetric rolz3, normal rolz3, fast rolz3
    thor e1, e3, e4
    tornado 0.3 -3, -5, -7
    slug
    pkzip 2.50/win32 -speed, -fast, [normal], -maximum
    ADDED: zip -1, normal, -9; bzip2 - for comparision

  10. #10
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    744
    Thanked 668 Times in 361 Posts
    Quote Originally Posted by Simon Berger
    I will show results of 7zip and UhaRc of the first test Testset tomorrow so you can make yourself a picture.
    i wonder what is your testing tools? if you have to make each test by hand, you will end up with large amount of handwork and soon you will stop refreshing test because you will lose interest

    good test should include two automation tools:
    first, database of compressors and their modes we are going to test, plus script which runs test on the compressors/modes specified and save results into another database. ive sort of such testing tool, look for maketest.rb in freearc distribution
    second, tool that uses database of results to generate various reports, and finally dynamic web page which provides interface to this report generator

    also, dont forget to establish exact testing rules about data caching, otherwise results of fast compressors may be dluctuated. i propose:
    1) reload computer before each test
    2) compute real time of compresion and decompression from disk to disk
    3) include tarring times for one-file compressors (and use them in pipe mode when possible)
    4) establish standard tarring utility and its switches

  11. #11
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    If I have begun the benchmark "Monster Of Compression" is really because I hoped that this happened! There be need of a BenchMark that puts in evidence merits and defects of all the programs of compression date! So much fortune!

  12. #12
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    The state of the new BIG benchmark?

  13. #13
    Member
    Join Date
    Oct 2007
    Location
    Germany, Hamburg
    Posts
    408
    Thanks
    0
    Thanked 5 Times in 5 Posts
    Hi, I have some other projects at the moment but it?s still alive. The only problem I have is that I don?t know what testset(s) is/are the best. Another reason why here aren?t results yet is that there are others using the ideas of this and another thread
    II will come up with news later.

  14. #14
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    Thanks! Hi

  15. #15
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    I suppose that the test-set should contain various types of data - to see how each algorithm/program reflects on it.

    I guess there is a four basic data-type gropus:
    1. Text
    2. Binary
    3. Analog
    4. Random/Compressed

    However, for your large test-set you can choose a resource file from a modern game like F.E.A.R.

    Game resources should contain all data types.


  16. #16
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    744
    Thanked 668 Times in 361 Posts
    there are still no bencmarks where archivers are rated by efficiency, except for MFC

  17. #17
    Member
    Join Date
    Jan 2007
    Location
    Moscow
    Posts
    239
    Thanks
    0
    Thanked 3 Times in 1 Post
    I propose combining different data types into several test sets - gamer, programmer, administrator, filmmaker, DJ, etc.

  18. #18
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    Good Idea!

  19. #19
    Member
    Join Date
    Oct 2007
    Location
    Germany, Hamburg
    Posts
    408
    Thanks
    0
    Thanked 5 Times in 5 Posts
    There is another "problem". For me the winner in all categories is obvious: 7Zip. It combines excellent compression with superb fast decompression (nearly no other program has such a big different between compression- and decompression speed). Also I have a quadcore and it can use dual core power so it is again much faster in compression speed.
    Testing an unimportant field of high compression and very slow compression or low compression and low speed . Isn?t really interesting especiallyfor a "real world test".

  20. #20
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    744
    Thanked 668 Times in 361 Posts
    Quote Originally Posted by Simon Berger
    7Zip.
    did you tried freearc?

  21. #21
    Member
    Join Date
    Jan 2007
    Location
    Moscow
    Posts
    239
    Thanks
    0
    Thanked 3 Times in 1 Post
    Quote Originally Posted by Simon Berger
    For me the winner in all categories is obvious: 7Zip.
    Cant agree. FreeArc is better almost always. Besides, there is SBC and UHA with very good multimedia filters. And the speed of of CCM is acceptable sometimes.

  22. #22
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Quote Originally Posted by Simon Berger
    7Zip.
    Yep, 7-Zip lacks multimedia compression. Furthermore, dont think that the LZMA - the main algo of 7-Zip is almighty. There is no practical algo that can compress ALL data types with the same efficiency. LZ77, PPM, ... all algorithms have own features - advantages and disadvantages. Concluding, each algorithm has own purposes and own niche.

    Try to compare LZMA (Fastest) and LZPM (1 and 2 levels). LZPM should be more efficient in terms of compression ratio/time, because of ROLZ-nature. The catch of ROLZ is a fast LZ-based compression.

  23. #23
    Member
    Join Date
    Oct 2007
    Location
    Germany, Hamburg
    Posts
    408
    Thanks
    0
    Thanked 5 Times in 5 Posts
    Quote Originally Posted by Bulat Ziganshin
    did you tried freearc?
    Yes

    Quote Originally Posted by nimdamsk
    Cant agree. FreeArc is better almost always. Besides, there is SBC and UHA with very good multimedia filters. And the speed of of CCM is acceptable sometimes.
    UhaRc is since years my most beloved archiver. The compression is almost everytime better then 7Zip. But not much bertter. 7Zips decompression and also compression speed beats the difference more then up (uha has the problem that the decompression speed isn?t much faster then the compression speed).
    Freearc is a candidate I have to test against 7Zip. Because it uses for some parts the same library it could beat it. SBC is indeed some of the best opponents. For CCM you have the same then for UhaRc. Also it has no archiver characters like also SBC.

    At all yes you count all archivers which are interesting for me at the moment. You could add sometimes Durilca Light ) (surely add projects like precomp and packjpg)


    EDIT:
    I like the idea of nimdamsk to have "situation packs" :-D There are two things in my mind.

    1. A vc2005 project which was compiled. It combines exe/objectfile and textfile compression (maybe too much objectfiles )
    2. A game with many unknown files

  24. #24
    Member
    Join Date
    Jan 2007
    Location
    Moscow
    Posts
    239
    Thanks
    0
    Thanked 3 Times in 1 Post
    My idea is to combine different data types into rather big amount of subgroups (exe, txt, log, htm, wav, raw, bmp, etc.) and make more high-level logical groups using selected subgroups. It would be better to make user make his own groups.

  25. #25
    Member
    Join Date
    Jan 2007
    Location
    Moscow
    Posts
    239
    Thanks
    0
    Thanked 3 Times in 1 Post
    Quote Originally Posted by Simon Berger
    Freearc is a candidate I have to test against 7Zip. Because it uses for some parts the same library it could beat it.
    The main advantage FreeArc gains on big (>=300 mb) file sets with repetitive structure - it is mainly because of REP. I was shocked when i got result of some testing - FreeArc compressed 8-9 times better then best competitors of its class.

    Lack of archiving functions in ccm is not a problem - author will add them some day

    Uharc has slow decompression only using PPM - ALZ and LZP both are fast in decompression. The main power of Uharc - compression of multimedia data - doesnt suffer much using ALZ instead of PPM.

  26. #26
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    744
    Thanked 668 Times in 361 Posts
    Quote Originally Posted by Simon Berger
    At all yes you count all archivers which are interesting for me at the moment.
    different users has different needs. ive wrote above main archivers for various speed/compression niches and datatypes

  27. #27
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    744
    Thanked 668 Times in 361 Posts
    Quote Originally Posted by Simon Berger
    1. A vc2005 project which was compiled. It combines exe/objectfile and textfile compression (maybe too much objectfiles [?????????????])
    only sources+executables

    Quote Originally Posted by Simon Berger
    2. A game with many unknown files
    too low amount of text data and too much - multimedia

  28. #28
    Member
    Join Date
    Oct 2007
    Location
    Germany, Hamburg
    Posts
    408
    Thanks
    0
    Thanked 5 Times in 5 Posts
    Quote Originally Posted by Bulat Ziganshin
    only sources+executables
    Quote Originally Posted by Bulat Ziganshin
    too low amount of text data and too much - multimedia
    You didn?t see the idea behind. We speak about realistic sets you would compress. I would use both packets so you have all.

    Quote Originally Posted by Bulat Ziganshin
    different users has different needs. ive wrote above main archivers for various speed/compression niches and datatypes
    Yes sometimes the time of decompression isn?t as important as the size of compressed data. But in the time of faster and faster internet compression looses the focus.

  29. #29
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    744
    Thanked 668 Times in 361 Posts
    Quote Originally Posted by Simon Berger
    You didn?t see the idea behind. We speak about realistic sets you would compress. I would use both packets so you have all.
    i see it ver clear if this is only two sets among 10-20 ones, then its not very helpful - its better to test sources, executables, etc separately. if its only two tests, then it doesnt contain many types of data

    Quote Originally Posted by Simon Berger
    But in the time of faster and faster internet compression looses the focus.
    yes, of course. if you not interested in compression, then dont test compressors at all

  30. #30
    Member
    Join Date
    Oct 2007
    Location
    Germany, Hamburg
    Posts
    408
    Thanks
    0
    Thanked 5 Times in 5 Posts
    You are to obvios in you own way of thinking. What I wanted to say with my sentence is: a good compression algorithmus is in any way useful. I give you an example.

    1. If I would have 1000 small files which are hard to compress but I want to compress them because the transmission of one file is a way faster and better to handle.
    2. If I have a mod with some good and bad compressable files and I want to send it to a friend I have to calculate whats the best condition in the time I need for compression and then the time the file needs to transmit.
    3.For files to more people it?s obvious that I need a compression method which is common (but that?s not a part that should influence the test).

    I like compression very much. It?s the reason why I am here. But time is in my eyes too important. I get not many scenarios in my head where you can today wait long for compression without loosing time.

Page 1 of 2 12 LastLast

Similar Threads

  1. REP and Delta fails with big files
    By SvenBent in forum Data Compression
    Replies: 14
    Last Post: 23rd November 2008, 20:41
  2. Replies: 11
    Last Post: 18th August 2008, 22:02
  3. I volunteer for some big dictionary test
    By SvenBent in forum Data Compression
    Replies: 18
    Last Post: 8th June 2008, 23:59
  4. Replies: 4
    Last Post: 17th March 2008, 22:19
  5. Fast decompression of big files
    By SvenBent in forum Forum Archive
    Replies: 16
    Last Post: 8th March 2008, 20:17

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •