Results 1 to 12 of 12

Thread: Multithreading experiment

  1. #1
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    863
    Thanks
    460
    Thanked 257 Times in 105 Posts

    Multithreading experiment : Zhuff 0.7

    Hi

    Following this previous post at encode.su, I've been mostly toying with multi-threading these last few weeks.
    For the time being, parallel chunk compression has been the main implementation choice, due to its design simplicity. It is also very efficient to spread over many cores.
    The result of which is the half-decent compressor Zhuff, which tries to provide on speed what it cannot provide on ratio.

    Zhuff is already very fast on single core, therefore testing its multi-cores sibling requires an ultra (!) fast I/O, something like a RAM drive. Even then, Zhuff might be able to saturate bandwidth on quad-core systems, something i cannot test myself on my rig, but would be glad to get some reports on.

    Alternatively, you may use the internal benchmark option for testings, which is pure in-memory compression (no I/O).

    Some early results, using this method :
    Benchmark platform : Core 2 Duo E8400 (3GHz), Window Seven 32-bits

    No code has to be inserted here.

    You can download and test this new version at its homepage :
    http://fastcompression.blogspot.com/p/zhuff.html
    Last edited by Cyan; 23rd March 2011 at 12:13.

  2. #2
    Member
    Join Date
    May 2008
    Location
    England
    Posts
    325
    Thanks
    18
    Thanked 6 Times in 5 Posts
    Quick test before i goto bed.

    Zhuff v0.7 (compiled Mar 15 2011)
    Fast Compression Software, by Yann Collet
    Nb of threads = 1
    Benchmarking 'Oblivion - Sounds.bsa', please wait...
    Compressed 330432865 bytes into 278968884 bytes (84.43%) at 160.2 MB/s.
    Checksum OK - Decompressed at 236.4 MB/s.

    Zhuff v0.7 (compiled Mar 15 2011)
    Fast Compression Software, by Yann Collet
    Nb of threads = 2
    Benchmarking 'Oblivion - Sounds.bsa', please wait...
    Compressed 330432865 bytes into 278968884 bytes (84.43%) at 315.8 MB/s.
    Checksum OK - Decompressed at 466.5 MB/s.

    C2D @ 3.3Ghz, 4Gb with 512Mb pagefile sitting on a ramdisk on XP.
    Relaxed timings on the memory atm 5-5-5-18, normally 4-4-4-12.

  3. #3
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    Any chance for Linux version?

    BTW:
    Does it use plain vanilla MMC or version with Secondary Promotions?
    Last edited by Piotr Tarsa; 17th March 2011 at 03:57.

  4. #4
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    863
    Thanks
    460
    Thanked 257 Times in 105 Posts
    Thanks for testings, Intrinsic. This is damn close to linear ramping on your C2D system.

    Anyone with a quad-core ?

    Does it use plain vanilla MMC or version with Secondary Promotions?
    Hi Piotr; none, this is a fast scanning strategy.
    But it's not the first time i'm asked this question.
    Maybe i should consider releasing a Zhuff-HC version after all...

    Regards
    Last edited by Cyan; 19th March 2011 at 01:38.

  5. #5
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,366
    Thanks
    212
    Thanked 1,018 Times in 540 Posts
    Q9450 @ 3.52ghz, ramdrive, enwik9.
    Speeds are taken from zhuff output, timings are external (and corresponding speed is a little slower)
    Code:
    th  c.size    c.sp      d.sp    c.tim  d.tim
    1 - 365122888 147.8MB/s 271MB/s 6.782s 3.688s
    2 - 365122888 286.9MB/s 503MB/s 3.485s 2.000s
    3 - 365122888 387.7MB/s 561MB/s 2.594s 1.797s
    4 - 365122888 429.4MB/s 556MB/s 2.359s 1.812s
    As a speed optimization, I'd suggest to only print stats every 0.5s or so
    (or at least provide an option to disable them).
    Fast time check can be done via rdtsc.

  6. #6
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    863
    Thanks
    460
    Thanked 257 Times in 105 Posts
    Thanks for testings Eugene. Quite a fast rig you have btw.

    So i guess, RAM Drive I/O saturation (which results in one full core entirely dedicated to running the RAM Drive driver) is reached around 2.3 threads for decoding,
    but is never reached for compression, although probably not far away.

    As a speed optimization, I'd suggest to only print stats every 0.5s or so
    (or at least provide an option to disable them).
    Good point. I'll use your suggestion in next releases.

  7. #7
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    788
    Thanks
    64
    Thanked 274 Times in 192 Posts
    Zhuff compression tested at 2 x 4 core 2.4 GHz Xeon and 6 x 7200 rpm RAID5 disks with input file 807,821,312 bytes html/text:

    zhuff -t1:
    Compression completed : 770.4MB --> 166.7MB (21.64%) (174841471 Bytes)
    Total Time : 4.75s ==> 170.0MB/s
    Kernel Time = 0.328 = 00:00:00.328 = 6%
    User Time = 4.765 = 00:00:04.765 = 100%
    Process Time = 5.093 = 00:00:05.093 = 107%
    Global Time = 4.750 = 00:00:04.750 = 100%

    zhuff -t2:
    Compression completed : 770.4MB --> 166.7MB (21.64%) (174841471 Bytes)
    Total Time : 2.41s ==> 335.6MB/s
    Kernel Time = 0.453 = 00:00:00.453 = 18%
    User Time = 4.781 = 00:00:04.781 = 197%
    Process Time = 5.234 = 00:00:05.234 = 216%
    Global Time = 2.422 = 00:00:02.422 = 100%

    zhuff -t3:
    Compression completed : 770.4MB --> 166.7MB (21.64%) (174841471 Bytes)
    Total Time : 1.63s ==> 496.8MB/s
    Kernel Time = 0.437 = 00:00:00.437 = 26%
    User Time = 4.828 = 00:00:04.828 = 297%
    Process Time = 5.265 = 00:00:05.265 = 324%
    Global Time = 1.625 = 00:00:01.625 = 100%

    zhuff -t4:
    Compression completed : 770.4MB --> 166.7MB (21.64%) (174841471 Bytes)
    Total Time : 1.25s ==> 645.7MB/s
    Kernel Time = 0.531 = 00:00:00.531 = 42%
    User Time = 4.828 = 00:00:04.828 = 386%
    Process Time = 5.359 = 00:00:05.359 = 428%
    Global Time = 1.250 = 00:00:01.250 = 100%

    zhuff -t5:
    Compression completed : 770.4MB --> 166.7MB (21.64%) (174841471 Bytes)
    Total Time : 1.02s ==> 794.3MB/s
    Kernel Time = 0.500 = 00:00:00.500 = 49%
    User Time = 4.812 = 00:00:04.812 = 473%
    Process Time = 5.312 = 00:00:05.312 = 522%
    Global Time = 1.016 = 00:00:01.016 = 100%

    zhuff -t6:
    Compression completed : 770.4MB --> 166.7MB (21.64%) (174841471 Bytes)
    Total Time : 0.84s ==> 956.0MB/s
    Kernel Time = 0.406 = 00:00:00.406 = 47%
    User Time = 4.843 = 00:00:04.843 = 563%
    Process Time = 5.250 = 00:00:05.250 = 611%
    Global Time = 0.859 = 00:00:00.859 = 100%

    zhuff -t7:
    Compression completed : 770.4MB --> 166.7MB (21.64%) (174841471 Bytes)
    Total Time : 0.73s ==> 1097.6MB/s
    Kernel Time = 0.578 = 00:00:00.578 = 77%
    User Time = 4.906 = 00:00:04.906 = 654%
    Process Time = 5.484 = 00:00:05.484 = 731%
    Global Time = 0.750 = 00:00:00.750 = 100%

    zhuff -t8:
    Compression completed : 770.4MB --> 166.7MB (21.64%) (174841471 Bytes)
    Total Time : 0.72s ==> 1122.0MB/s
    Kernel Time = 0.468 = 00:00:00.468 = 63%
    User Time = 4.953 = 00:00:04.953 = 674%
    Process Time = 5.421 = 00:00:05.421 = 738%
    Global Time = 0.734 = 00:00:00.734 = 100%

  8. #8
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    863
    Thanks
    460
    Thanked 257 Times in 105 Posts
    Thanks very much for testing, Sportman

    > 2 x 4 core 2.4 GHz Xeon and 6 x 7200 rpm RAID5 disks
    that's a monstrous configuration

    Anyway, you get from such a config a much better speed than RAM Drive, and using less CPU too (about 20% apparently).

    Wall speed seems reached at about 1100MB/s, which is just, well, the fastest i've ever seen, and up to this point, ramp-up is relatively linear (about +150MB/s for each added core), so i guess this can be considered good.

    Regards

  9. #9
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,366
    Thanks
    212
    Thanked 1,018 Times in 540 Posts
    Note that his file is relatively redundant, almost 2x better compression than enwik9.
    I think zhuff could reach even better speed on a 1G of zeroes

  10. #10
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    863
    Thanks
    460
    Thanked 257 Times in 105 Posts
    yeah, you right.
    I think worst speed is actually reached on compressing raw (uncompressed) images, with compression slowed down to "just" 100 MB/s per core. Which is still reasonable.

    I'm actually much more interested in the regular ramp-up of this 8-cores configuration. This is very good report for the MT code.
    Last edited by Cyan; 23rd March 2011 at 01:08.

  11. #11
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    zhuff results. http://mattmahoney.net/dc/text.html#3651

    On my laptop wall times with -t1 and -t2 are the same because the program is already faster than disk using 1 thread.

  12. #12
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    863
    Thanks
    460
    Thanked 257 Times in 105 Posts
    Thanks for testings, Matt

Similar Threads

  1. Zhuff - fast compression
    By Cyan in forum Data Compression
    Replies: 38
    Last Post: 5th February 2014, 11:27
  2. Old version of RAR
    By d33j4y in forum Data Compression
    Replies: 7
    Last Post: 17th August 2010, 14:52
  3. Unaligned bitstring matching experiment
    By Shelwien in forum Data Compression
    Replies: 1
    Last Post: 16th April 2010, 20:59
  4. New lpaq1 version
    By Matt Mahoney in forum Forum Archive
    Replies: 21
    Last Post: 29th October 2007, 01:35

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •