Results 1 to 2 of 2

Thread: Multi-threading motivation

  1. #1
    Member
    Join Date
    May 2008
    Location
    USA
    Posts
    44
    Thanks
    0
    Thanked 3 Times in 3 Posts

    Multi-threading motivation

    A lot of research has gone into tighter compression ratios at the expense of all else, but I just wanted to throw out a bone for multithreading. At my workplace, I recently got a hold of a Sun T5120, a relatively modest multi-core server. I write "modest" because, while it has eight processors, each processor is very slow (1GHz). But what this architecture does well is switch to a new thread while prior thread(s) are waiting for memory or bus access. This gives each CPU eight "hardware" threads.

    Here's how dramatic this can be if you give it the right multithreaded application: I took an 88MB test file and compressed it with bzip2, gzip, and pbzip2, and measured the time it took for all three. (I did not test pigz because I could not get it to compile and run properly in Solaris 10.) pbzip2 takes advantage of the fact that bzip2 processes files in 900KB blocks, so it reads 900KB chunks and gives each one to a CPU. To the operating system, 8 cores with 8 threads looks like 64 different CPU cores. So, 64 different blocks are being compressed at a time.

    Here are the results:

    Code:
    # time bzip2 -v test.bin
      test.bin:  7.226:1,  1.107 bits/byte, 86.16% saved, 84477952 in, 11690074 out.
    1m35.35s
    
    # time gzip -v test.bin
    test.bin:        80.6% -- replaced with test.bin.gz
    0m28.70s
    
    # pbzip2 -v test.bin
    Parallel BZIP2 v1.0.2 - by: Jeff Gilchrist [http://compression.ca]
    [July 25, 2007]             (uses libbzip2 by Julian Seward)
    
             # CPUs: 64
     BWT Block Size: 900k
    File Block Size: 900k
    -------------------------------------------
             File #: 1 of 1
         Input Name: test.bin
        Output Name: test.bin.bz2
    
         Input Size: 84477952 bytes
    Compressing data...
        Output Size: 11757110 bytes
    -------------------------------------------
    
         Wall Clock: 4.678694 seconds
    So let's review:

    • bzip2: 95 seconds
    • gzip: 28 seconds
    • pbzip2: 5 seconds


    So although it didn't scale linearly (it probably could have if I gave it a much larger file, as some hardware threads only touched one block while others touched two), pbzip2 was nearly 20 times faster than bzip2. With much larger files (ie. 8 gig) and less cpus (ie. 4 CPUs) I see a near-linear speedup using pbzip2.

    My question to everyone: Can PAQ/CM-style compressors be sped up in a similar fashion? If so, what would be the general way of doing this? (Disclaimer: I know LZ77/LZ78 very well but my PAQ/CM knowledge is weak)

  2. #2
    Member Zonder's Avatar
    Join Date
    May 2008
    Location
    Home
    Posts
    55
    Thanks
    20
    Thanked 6 Times in 5 Posts
    Can PAQ/CM-style compressors be sped up in a similar fashion?
    yes.

    If so, what would be the general way of doing this?
    theoreticaly, archivator should split data in blocks, and compress in parallel manner...

    EDIT:
    but "everybody" are lazy to do this , plus compression ratio likely can drop with this method, and you will be limited to fast modes, because of RAM shortage.

    check 4x4 compressor - http://haskell.org/bz/
    7-Zip and NanoZip authors are doing/will do something simillar.
    Last edited by Zonder; 10th September 2008 at 05:35.

Similar Threads

  1. Multi-Volume Archives
    By osmanturan in forum Data Compression
    Replies: 12
    Last Post: 13th June 2009, 01:46
  2. PAQ8o8 threading observations
    By CodeMutant in forum Forum Archive
    Replies: 15
    Last Post: 18th February 2008, 10:02
  3. Replies: 1
    Last Post: 14th October 2007, 14:10
  4. Replies: 0
    Last Post: 26th July 2007, 18:47
  5. Replies: 1
    Last Post: 18th April 2007, 19:36

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •