Results 1 to 14 of 14

Thread: BZIP2 1.0.6 for Windows

  1. #1
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts

    BZIP2 1.0.6 for Windows

    http://bzip.org

    My quick compile attached to this message!
    Attached Files Attached Files

  2. #2
    Member Vacon's Avatar
    Join Date
    May 2008
    Location
    Germany
    Posts
    523
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Thumbs up

    Hello everyone,

    thanks for sharing this

    Best regards!

  3. #3
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts

    Smile BZIP2.1.0.6 with BWTS?

    I have not looked at BZIP2 or at least not for several years. But does any one have an idea how hard it would be to replace the BWT with a BWTS. Of course you could leave the headers and interface the same and have a phony index. But why waste the space with the index. It might be fun to just try it and I may do it. But since many have toyed with this do you think it would be hard to modify?

  4. #4
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    It would break compatibility with existing .bz2 files and not provide any advantage in speed or compression ratio besides saving 20 bits per 1 MB block for the starting index. People use bzip2 even though there are better compressors because they care about others being able to decompress the files later. Still, you should try it because maybe I am wrong and there are practical benefits. Breaking up a block into Lyndon words would lose context in stationary sources but might help compression when fast adaptation is needed, but then so would splitting a file into smaller blocks with different data types.

  5. #5
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts

    Angry

    During my working career I was an expert at adding features yet keeping things compatible. It might be child's play to at least allow things compressed with older version to be compatible with a BWTS version. It depends on what the guy who wrote the software did for what he is claiming as the BWT every one does it different and I don't mean just the index. If he used the original BWT where a rotation in the input data only changes the index its very easy. If he at least used a version of BWT where if the input data rotated such that the BWT causes a zero value for index and the input at this point would be a single lyndon word at which point many of the different BWT methods give the same transformed value as the BWTS. If this is the case then you could use in decompression the old BWT routines if index not zero. If index zero you just use the BWTS routines. To keeps it upward compatible on decompression you make the index zero for the BWTSed part.
    This means old BZIP2 could compress and decompress files. While a new BWTS could at least decompress any old BWT BZIP2 files. In a slow mode you could compress blocks twice once with BWT and once with BWTS on decompression the fact that index not zero in a block by block bais would single the old style was used. This would mean any program compressed with new slow version would always compress the same or better. In other words a BWTS version of BZIP2 if one wished could always compress the same or better.

  6. #6
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    Did you say you were going to write it?

  7. #7
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts

    Question

    Not exactly. I would need to know just where and how the interface is in his code for BWT and UNBWT. I would also need to see what he gets for some simple rotations of short files. I use short data strings like BANANAS to test for what method is being used. If he does the real original one he will get a different index value for each rotation of the word but the same BWT data set which is the BWTS of ANANASB
    most implementations I have seen would get an index of 0 for the BWT of ANANASB which is BNNSAAA which is the same as the BWTS
    so I use his old method if index not zero when uncompressing and since BWTS would in this case always return 0 I would use the UNBWTS for this case. Its possible the BWT method he uses returns the single word case for a lyndon word rotated one character to right of left. In which case I would rotate the BWTS one character right of left to remain comparable. Look I moded BWTMIX to use BWTS it can't be that hard unless he is using some oddball implementation for BWT.

    This could be used when the next person rewrites a new BZIPX program. He might decided if he understands both to actually start with the BWTS instead of some form of BWT. I realy think if BWTS was invented first no one would be amazed by or even use the BWT form since it add extra data to a file and the fact is UNBWT is not even valid for most data sets.

    Look I am an old programmer not exactly sure how all routines work together MAKE is not my thing. But if your interested we could work together on this. The first step would be actually to see if using his source code and simple compiles can one actually build an executable that works. When I played with paq the hardest part what see how it flowed since I am more use to straight coding. But once I was able to actually get it to compile and run. Changing it was easy.

    I would look at this as more of a test project. I would expect that the BWTS version might be longer since carrying around an index that has to be written to a file and never used. Its more of a proof of concept. Also since all the after stages have been
    tuned with his version of a BWT it might be less tuned to a real BWTS version.
    The only real value would be to show what BWT should have been when first invented instead of adding a useless index. Even you think of it as several short blocks based in the lyndon words. But its not, suppose there are 100 lyndon words in a large file spaced evenly. If you are taking BWT of something that has a bunch of " THE" you would expect that the SPACES would bunch together because of the " THE" every were in the file. Well that also happens in the BWTS the spaces from the " THE" will sort to same area of output file regardless of which lyndon word the string came from. But yes one of he lydon word boundaries could cut that string but not often.

    When the next programmer comes along and writes BZIPX if he understands the difference between the many versions of BWT and BWTS he may decide to use BWTS instead. You have to admit that if BWTS was invented first no one would use BWT

  8. #8
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    I am not so sure because people care more about compatibility, speed, and compression ratio. BWTS saves a 4 byte index, but is it faster? It might be more interesting to try BWTS in a newer compressor like BSC or BBB. BSC is very fast, and BBB uses 1.25n memory which could be adapted to BWTS I think. Both are open source and compress much smaller than BZIP2.

  9. #9
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    Look I start and stop several things. But reach a point of boredom and then stop. I realize looking at BZIP2 code that much of it is familiar. And yes I tested some modified versions and it does seem to do the same transform for blocks that are a single lyndon word so that it would be possible to drop BWTS in the code and create a version that would compress the same or better than current BZIP2 while maintaining compatablity with the current version of BZIP2. What one could do is compress each block using BOTH his BWT version and the BWTS version when BWT shorter use it. When BWTS version shorter use it with the index value set to zero. When BWT returns index of zero its the same as BWTS so no problem except it would be SLOW and would not save much space at all.

    However what I really want at this point is a standalone bijective compressor using BWTS and I do not have one on the net. I want to write one similar to BZIP2 but I first want to focus on the binary BWTS. I will write a simple ( for a binary 2 state ) version first. And will compress to a string of Fibonacci numbers first. I will use a modifed Yuta BWTS and UNBWTS since I like his coding style. I want to put first version on this site in a few days. It may only work for one block of a million bytes at first. But I will evolve it to use more than one block and an bijective entropy coder as last stage. When I get bored with this will move on to the 256 state BWTS and maybe higher. Look at will not be super fast. The first version will use my bit input output routines and only and the 2 state BWTS followed by a converting to the Universal numbers. Of courst it will be all bijective. Its a small start and I hope I can keep my word.

    Some day I might mod BBB but I hope others get curious enough about BWTS to do it so I don't have to.

  10. #10
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    I don't think it is possible to make bzip2 BWTS and keep compatibility. I would just write a new compressor.

    BBB is far from optimal. I wrote it to illustrate low memory BWT. A better compressor would replace quicksort() with libDivSufSort or MSufSort. Of course you could do this with BWTS if you divide the input into Lyndon words first. The context model could also be made faster by coding run lengths.

  11. #11
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    I like you but it is possible to make BZIP2 use BWTS and still be able to uncompress files created with regular BWT version of BZIP2. The reason is that the index when ZERO the BWTS and BWT the same. When you compress a block with BWTS just
    set the index to zero all the time.

    So on when you uncompress you check the index if ZERO use UNBWTS for that block since if made with BWT it would get same answer. If the index not ZERO you use the way that is currently in BZIP2. So it would work. Or I am not sure what you mean by incompatible.

    If I could get paid by your company as a test project I would have higher motivation. They could use it as a test of my ability to do something. The pay could be ZERO if it fails to work. But since I would write all blocks using BWTS and still have the zero index It would most likely not compress as well as the current code. I would not want to code where I select which of the two to use and then use the best for each block because that would be a lot harder and very slow.

  12. #12
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    410
    Thanks
    37
    Thanked 60 Times in 37 Posts

    pbzip2 v1.1.5 - windows binary ?

    @encode

    at the beginning of this tread you have compiled bzip 1.0.6

    can you please try to compile a windows binary of the parallel implementation of bzip2

    from http://compression.ca/pbzip2/ ?

    the actually version is v1.1.5 (Jul 16, 2011)

    (The output of this version is fully compatible with bzip2 v1.0.2 or newer.)

    best regards

  13. #13
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    What about incorporating Mori's patch for bzip2 (which generally replaces Seward's sorting algorithm with libdivsufsort) into pbzip2?

  14. #14
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    702
    Thanks
    210
    Thanked 267 Times in 157 Posts
    Quote Originally Posted by Piotr Tarsa View Post
    pbzip2?
    Didn't LZMA make bzip2 more or less irrelevant, and later Brotli even more so?

Similar Threads

  1. Linux Kernel community phases out Bzip2 for source bundles
    By moinakg in forum Data Compression
    Replies: 10
    Last Post: 10th January 2014, 02:39
  2. jpegoptim 1.2.4 for Windows
    By nikkho in forum Data Compression
    Replies: 23
    Last Post: 19th June 2013, 00:22
  3. Parallel BZIP2 (PBZIP2)
    By joerg in forum Data Compression
    Replies: 21
    Last Post: 18th January 2012, 23:33
  4. bzip2 dictionary size
    By Wladmir in forum Data Compression
    Replies: 3
    Last Post: 7th April 2010, 17:09
  5. bzip2 1.05 optimized compile
    By M4ST3R in forum Download Area
    Replies: 0
    Last Post: 21st September 2009, 20:49

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •