Results 1 to 22 of 22

Thread: Parallel BZIP2 (PBZIP2)

  1. #1
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    410
    Thanks
    37
    Thanked 60 Times in 37 Posts

    Thumbs up Parallel BZIP2 (PBZIP2)

    new version 1.05 released (Jan. 08, 2009)
    ---
    http://compression.ca/pbzip2/
    ---
    open source software under a BSD-style license
    ---
    PBZIP2 is a parallel implementation of the bzip2 block-sorting file compressor that uses pthreads and achieves near-linear speedup on SMP machines.
    The output of this version is fully compatible with bzip2 v1.0.2 or newer
    (ie: anything compressed with pbzip2 can be decompressed with bzip2).
    ---
    has anyone seen this running under windows ? maybe with cygwin ?
    ---
    it sounds interesting, but i dont see any benchmarks (CORE 2 DUO ??)

    best regards
    Last edited by joerg; 14th January 2009 at 02:04.

  2. #2
    Member
    Join Date
    Jul 2006
    Location
    US
    Posts
    39
    Thanks
    26
    Thanked 1 Time in 1 Post
    Another new version has been released.

    v1.1.0 (Mar 13, 2010)

    http://compression.ca/pbzip2/

    No build yet for Windows.

  3. #3
    Member
    Join Date
    May 2009
    Location
    Europe
    Posts
    67
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by joerg View Post
    new version 1.05 released (Jan. 08, 2009)
    has anyone seen this running under windows ? maybe with cygwin ?
    ---
    it sounds interesting, but i dont see any benchmarks (CORE 2 DUO ??)
    http://encode.dreamhosters.com/showthread.php?t=473

  4. #4
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    410
    Thanks
    37
    Thanked 60 Times in 37 Posts
    wonderful!

    thank you very much for your optimzed compile:

    http://www.fileden.com/files/2006/8/...0.5_M4ST3R.rar

    in my test the resulting file is identic

    -------------------------------------------
    .\PBZIP2\pbzip2 -v -k ..\ORGDTA\db.dmp
    Parallel BZIP2 v1.0.5 - by: Jeff Gilchrist [http://compression.ca]
    [Jan. 08, 2009] (uses libbzip2 by Julian Seward)

    # CPUs: 2
    BWT Block Size: 900k
    File Block Size: 900k
    -------------------------------------------
    File #: 1 of 1
    Input Name: ..\ORGDTA\db.dmp
    Output Name: ..\ORGDTA\db.dmp.bz2

    Input Size: 648331264 bytes
    Compressing data...
    -------------------------------------------

    Wall Clock: 116.087000 seconds

    .\PBZIP2\pbzip2_105_M4STR -v -k ..\ORGDTA\dbm.dmp
    Parallel BZIP2 v1.0.5 - by: Jeff Gilchrist [http://compression.ca]
    [Jan. 08, 2009] (uses libbzip2 by Julian Seward)

    # CPUs: 2
    BWT Block Size: 900k
    File Block Size: 900k
    -------------------------------------------
    File #: 1 of 1
    Input Name: ..\ORGDTA\dbm.dmp
    Output Name: ..\ORGDTA\dbm.dmp.bz2

    Input Size: 648331264 bytes
    Compressing data...
    -------------------------------------------

    Wall Clock: 91.060000 seconds

    -------------------------------------------

    that means your optimizd compile is significantly faster

    ask 1:
    you wrote: "no pthreadGC2.dll needed"
    what do you have optimized?
    ask 2:
    can you please try to make a compile of the new version 1.1.0 ?

    best regards

  5. #5
    Member
    Join Date
    May 2009
    Location
    Europe
    Posts
    67
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by joerg View Post
    that means your optimizd compile is significantly faster
    Good, the gain was smaller on my computer

    Quote Originally Posted by joerg View Post
    you wrote: "no pthreadGC2.dll needed"
    what do you have optimized?
    Nothing. It's just that the "official" build is dynamically linked to the dll (you need to have it somewhere in your system) while my build is statically linked (you don't need the dll)
    The only change I did is an automatic detection of the number of cores.
    Quote Originally Posted by joerg View Post
    can you please try to make a compile of the new version 1.1.0 ?
    Not now. I recently did a fresh install of win 7 and I haven't installed my dev tools yet. I wait for the release of visual studio (in few weeks)

  6. #6
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    410
    Thanks
    37
    Thanked 60 Times in 37 Posts
    thank you for your quick answer!

    you are waiting for Microsoft Visual Studio 2010 ?

    have you tried the
    Microsoft Visual Studio 2010 Ultimate RC - ISO ?

    download:

    http://download.microsoft.com/downlo...2010Ult_RC.iso

    if possible (and no modifcation is necessary) then
    maybe it would be better to leave the pthreadGC2.dll
    in its original state ?

    okay - waiting for a new win32-compile of bzip2 1.1.0

    best regards

  7. #7
    Member
    Join Date
    May 2009
    Location
    Europe
    Posts
    67
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by joerg View Post
    if possible (and no modifcation is necessary) then
    maybe it would be better to leave the pthreadGC2.dll
    in its original state ?
    Sure, it's only one compiler switch IIRC

  8. #8
    Member
    Join Date
    May 2007
    Location
    Poland
    Posts
    91
    Thanks
    8
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by joerg View Post

    that means your optimizd compile is significantly faster
    M4ST3R's compiles are fastest by default

  9. #9
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    410
    Thanks
    37
    Thanked 60 Times in 37 Posts
    @M4ST3R

    pbzip2 = parallel bzip2 (http://compression.ca/pbzip2/)

    in version 1.0.5 is now listed on http://compressionratings.com/i_pbzip2.html

    and

    pigz = parallel implementation of gzip (http://www.zlib.net/pigz/)

    in version 2.1.4 is now listed on http://compressionratings.com/i_pigz.html

    available are now pbzip2 version 1.1.0 and pigz 2.1.6

    on compressionratings.com tested the compiled versions from www.leszer.net

    ---
    you wrote: I wait for the release of visual studio
    ---

    @M4ST3R

    i think, it would be very interesting to see
    how fast your compiles work on this benchmark

    again:
    can you please try to make a win32-compile of the new pbzip2 1.1.0
    and maybe too a win32-compile of the new pigz 2.1.6 ?

    thank you very much in advance

    best regards

  10. #10
    Member
    Join Date
    May 2009
    Location
    Europe
    Posts
    67
    Thanks
    0
    Thanked 1 Time in 1 Post
    Sorry but don't expect a build from me until:
    -VS 2010 is available in my language
    -Intel releases a new version of the compiler that officially supports VS 2010

  11. #11
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    410
    Thanks
    37
    Thanked 60 Times in 37 Posts
    thank you very much for your quick answer

    sorry for now
    here is only intel c++ version 11.1 Update 5 (February 2010),
    which supports visual studio 2008

    best regards ... waiting

  12. #12
    Member
    Join Date
    May 2009
    Location
    Europe
    Posts
    67
    Thanks
    0
    Thanked 1 Time in 1 Post
    With some hacks I managed to make ICL somewhat usable with VS2010.

    Unfortunatly newest versions of pbzip2 can't be compiled by visual studio or icl. It uses some posix signals. I'm not even sure mingw would compile. Cygwin may work though

  13. #13
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    410
    Thanks
    37
    Thanked 60 Times in 37 Posts
    @M4ST3R

    thank you very very much for your answer

    * With some hacks I managed to make ICL somewhat usable with VS2010.
    * Unfortunatly newest versions of pbzip2 can't be compiled
    * by visual studio or icl. It uses some posix signals.
    * I'm not even sure mingw would compile. Cygwin may work though

    the last source-version is now from 17-apr-2010

    http://compression.ca/pbzip2/pbzip2-1.1.1.tar.gz

    leszer has done the last win32-compile
    with the "www.bloodshed.net"-compiler Dev-C++ 4.9.9.2
    (it includes a full Mingw compiler system)

    from

    http://prdownloads.sourceforge.net/d....9.2_setup.exe

    pbzip2 seems to do a very well scale on multicore environment

    http://compression.ca/pbzip2/bench-multi.gif

    Thank you in advance for your efforts

    best regards

  14. #14
    Member
    Join Date
    May 2010
    Location
    Austria
    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts
    just tried to compile the new v1.1.1 of pbzip2:

    • standard compile was fine, although i was not able to create a static version


    • compression with v1.1.1 achieves almost the same performance as v1.1.0. in fact it was about 1% - 2% faster as v1.1.0


    • decompression unfortunately is far slower than v1.1.0 and simply stops after decompressing about 63% of my 2.5GB testfile!

    so i continue to use v1.1.0 i compiled a few weeks ago.

    if anybody interested, my v1.1.0 can be found here:

    http://rapidshare.com/files/38525238...1.1.0.rar.html

    i used cygwin to compile on windows. as i'm not too experienced in softwaredevelopment i have no idea how to put the required cygwin dll's into pbzip2.exe if thats possible at all. so in order to use my compiled pbzip2.exe you need to have the cygwin dll files contained in the archive in the same directory as pbzip2.exe.

    there is also another problem with this compile: sometimes the compression simply hangs and cpu usage and io drops to zero. pressing ctrl-c in this situation makes pbzip2 to continue. after decompression and comparing the resulting file with the original file, both files are identical. so it seems the compression was fine anyway.

    although this happens very rarely its still annoying. have no clue if this is a result of my compile or it's a problem with pbzip source. need to contact the author ...
    Last edited by fgw; 9th May 2010 at 15:35.

  15. #15
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    410
    Thanks
    37
    Thanked 60 Times in 37 Posts
    on the home page of Eduardo Terol (http://www.leszer.net/index.php/my-software.html)

    is now a win32-version of the new pbzip2 1.1.1 downloadable:

    D:\COMPRESS\PBZIP2-111>pbzip2 -h

    Parallel BZIP2 v1.1.1 - by: Jeff Gilchrist [http://compression.ca]
    [Apr. 17, 2010] (uses libbzip2 by Julian Seward)
    Major contributions: Yavor Nikolov <nikolov.javor+pbzip2@gmail.com>

    Usage: pbzip2 [-1 .. -9] [-b#cdfhkm#p#qrS#tVz] <filename> <filename2> <filenameN
    >
    -1 .. -9 set BWT block size to 100k .. 900k (default 900k)
    -b# Block size in 100k steps (default 9 = 900k)
    -c,--stdout Output to standard out (stdout)
    -d,--decompress Decompress file
    -f,--force Overwrite existing output file
    -h,--help Print this help message
    -k,--keep Keep input file, don't delete
    -m# Maximum memory usage in 1MB steps (default 100 = 100MB)
    -p# Number of processors to use (default 2)
    -q,--quiet Quiet mode (default)
    -r,--read Read entire input file into RAM and split between processors
    -t,--test Test compressed file integrity
    -v,--verbose Verbose mode
    -V,--version Display version info for pbzip2 then exit
    -z,--compress Compress file (default)

    Example: pbzip2 -b15vk myfile.tar
    Example: pbzip2 -p4 -r -5 myfile.tar second*.txt
    Example: tar cf myfile.tar.bz2 --use-compress-prog=pbzip2 dir_to_compress/
    Example: pbzip2 -d -m500 myfile.tar.bz2

    it seems that the errors from version 1.1.0 are fixed ..

    the resulting archiv-files are in my tests identically to the resulting files from version 1.0.5

    there is a new commandline parameter in version 1.1.1:

    -m# Maximum memory usage in 1MB steps (default 100 = 100MB)

    the new program-version seems to work a little bit faster


    many thanks @Eduardo_Terol

    best regards

  16. #16
    Member
    Join Date
    Jul 2006
    Location
    US
    Posts
    39
    Thanks
    26
    Thanked 1 Time in 1 Post
    New version has been released. (source, no Win binary)

    v1.1.2 (Feb 19, 2011)

    http://compression.ca/pbzip2/

    Changes:
    ? Fix directdecompress segfault when destination file can't be opened (e.g. read-only) (bug #717852)
    ? Implemented --ignore-trailing-garbage feature (bug #59486
    ? Fixed hang on decompress of some truncated archives (bug #590225)
    ? Pulled an error check out of normal logic block for clarity
    ? Debug print added after BZ2_bzDecompress to track it's return code
    ? A debug print fixed in queue::remove
    ? Increased max memory usage limit from 1GB to 2GB
    ? If no -m switch given on command line, default max memory limit will now automatically increase from 100 MB to minimum amount of memory required to support the number of CPUs requested
    ? Improved performance when output buffer is full
    ? Fixed bug which caused hang while decompressing prematurely truncated bzip2 stream
    ? Consumer_decompress throttling modified to prevent potential deadlock/infinite loop in certain situations (Thanks to Laszlo Ersek for finding and helping track down the cause of this bug)
    ? Fixed deadlock bug and performance issue when consumer working with long bzip2 sequences (Thanks to Tanguy Fautre for finding)
    ? Fixed error message for block size range (max size was wrong)
    ? Moved #include <pthread.h> from pbzip2.cpp to pbzip2.h to fix OS/2 compiler issue

  17. #17
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    410
    Thanks
    37
    Thanked 60 Times in 37 Posts
    there is a new version v1.1.3 - Mar 27, 2011 on http://www.compression.ca/pbzip2/

    bugs fixed:

    * Fixed hang on decompress with --ignore-trailing-garbage=1 and higher numCPU (e.g. > 2) (bug #740502)
    * Print trailing garbage errors even when in quiet mode (bug #743635)
    * Default extension on decompress of .tbz2 changed to .tar for bzip2 compatibility (bug #743639)

    source: http://www.compression.ca/pbzip2/pbzip2-1.1.3.tar.gz - for now there is no windows-binary of the new version ...

  18. #18
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    410
    Thanks
    37
    Thanked 60 Times in 37 Posts
    there is a new version v1.1.5 - Jul 16, 2011 on http://www.compression.ca/pbzip2/

    bugfix:

    * Fixed excessive output permissions while compress/decompress is in progress

    source: http://compression.ca/pbzip2/pbzip2-1.1.5.tar.gz

    for now there is no windows-binary of the new version ...

    it would be interesting to see it running on Intel Core i7 ...

  19. #19
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    410
    Thanks
    37
    Thanked 60 Times in 37 Posts

    pigz = parallel gzip - new version 2.2.3 (15 Jan 2012)

    there is a new pigz version 2.2.3 - 2012-01-15 on http://zlib.net/pigz

    source: http://zlib.net/pigz/pigz-2.2.3.tar.gz
    ---
    changes:

    2.1.7 2011-12-17 Avoid unused parameter warning in reenter()
    Don't assume 2's complement ints in compress_thread()
    Replicate gzip -cdf cat-like behavior
    Replicate gzip -- option to suppress option decoding
    Test output from make test instead of showing it
    Updated pigz.spec to install unpigz, pigz.1 [Obermaier]
    Add PIGZ environment variable [Mueller]
    Replicate gzip suffix search when decoding or listing
    Fix bug in load() to set in_left to zero on end of file
    Do not check suffix when input file won't be modified
    Decompress to stdout if name is "*cat" [Hayasaka]
    Write data descriptor signature to be like Info-ZIP
    Update and sort options list in help
    Use CC variable for compiler in Makefile
    Exit with code 2 if a warning has been issued
    Fix thread synchronization problem when tracing
    Change macro name MAX to MAX2 to avoid library conflicts
    Determine number of processors on HP-UX [Lloyd]
    2.2 2011-12-31 Check for expansion bound busting (e.g. modified zlib)
    Make the "threads" list head global variable volatile
    Fix construction and printing of 32-bit check values
    Add --rsyncable functionality
    2.2.1 2012-01-01 Fix bug in --rsyncable buffer management
    2.2.2 2012-01-01 Fix another bug in --rsyncable buffer management
    2.2.3 2012-01-15 Remove volatile in yarn.c
    Reduce the number of input buffers
    Change initial rsyncable hash to comparison value
    Improve the efficiency of arriving at a byte boundary
    Add thread portability #defines from yarn.c
    Have rsyncable compression be independent of threading
    Fix bug where constructed dictionaries not being used

    To-do: make source portable for Windows, VMS, ... and make build portable (currently good for Unixish)
    ---

    there is also a newer version 1.1.6 - 2011-10-31 of pbzip2 on http://www.compression.ca/pbzip2/

    source: http://compression.ca/pbzip2/pbzip2-1.1.6.tar.gz

    (a version 1.1.7 seems to be in testing)

    bugfix:
    --
    Fixed bug - deadlock due to unsynchronized broadcasts (bug #876686)
    Prevent deletion of input files on error (bug #874543)
    Document how to compress/decompress from standard input (bug #820525)
    Added more detailed kernel error messages (bug #874605)
    Fixes for error handling in muliti-file processing (bug #883782)
    --

    for now there are no windows-binary of the new version ...

    it would be interesting to see it running on Intel Core i7 ...

    best regards

  20. #20
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    Comparison with gzip under Linux on a Core i7. http://mattmahoney.net/dc/text.html#3229

  21. #21
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    410
    Thanks
    37
    Thanked 60 Times in 37 Posts
    @Matt Mahoney

    thank you very much for the test results

    have you compiled a 32-bit version or a 64-bit version ?

    can you please give us a download-link to your binary ?

  22. #22
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    I compiled for 64 bit Linux. Just download and "make". Windows would take more work, so I didn't pursue it.

Similar Threads

  1. bzip2 dictionary size
    By Wladmir in forum Data Compression
    Replies: 3
    Last Post: 7th April 2010, 17:09
  2. pbzip2 1.05 optimized compile
    By M4ST3R in forum Download Area
    Replies: 0
    Last Post: 2nd October 2009, 17:21
  3. bzip2 1.05 optimized compile
    By M4ST3R in forum Download Area
    Replies: 0
    Last Post: 21st September 2009, 20:49
  4. parallel compression with batches
    By evg in forum Data Compression
    Replies: 4
    Last Post: 17th September 2009, 19:14
  5. LZTURBO 0.9 parallel compressor
    By donotdisturb in forum Forum Archive
    Replies: 18
    Last Post: 6th March 2008, 01:23

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •