Results 1 to 19 of 19

Thread: BWTS GENERAL COMPRESS in MinGW exe's

  1. #1
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts

    BWTS GENERAL COMPRESS in MinGW exe's

    The first posting I only matched M99 and the exec where big.
    So I reduced BWTS and UNBWTS by allowing only 800k sort
    block.

    The Big RLEQ I did a change to it that I should have done
    years ago but was to lazy also eliminated the storage of
    the file that was in array

    Changing RLEQ forced changed in UNRLEQ and ARB25X was
    changed to ARB25Y

    It know compresses to the 18 file Calgary to 865,838 but
    with tuning it could be better. Maybe next time will use
    a different RLEQ but this seems to work better on short
    files.

    The code should be bijective only tested to sets of files
    the Calgary witch uncompress to some 15 megs and
    the bananas files the Bananas all compress to 5 bytes except
    for one which compress to 6 try that with any other full service
    type of OLD STYLE BWT compressor.

    http://bijective.dogma.net/bwtsvtmp.zip

    All the source all the exe bat files as well
    as log of Bananas test and Calgary test.

    Take Care
    David A. Scott

  2. #2
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    At last testable.
    Bookstar: 11448584, ~200s.
    TCUP: 123225958, ~1700s.

    Remember the performance issues I reported while back?
    It happens with BMP.

  3. #3
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts

    Red face

    Quote Originally Posted by m^2 View Post
    At last testable.
    Bookstar: 11448584, ~200s.
    TCUP: 123225958, ~1700s.

    Remember the performance issues I reported while back?
    It happens with BMP.
    Look this is code is based on just doing the full bijective compression
    speed in not yet a concern in this demo version.

    In the test *.bat's it also uncompresses and compresses back so in effect your doing 3 extra sets. Second as Matt found out MinGW barfs on my source code in this case arb25y and unarb25y are supper slow since the compiler screws up on optimization. The DGJPP versions of these last two things are much faster. Maybe some day MingGW will be fixed then it will speed up more. But its a start. I don't have Vista this was done only to help show those here about the usefulness of BWTS. In fact I thought about using OPENBWT which is faster but I don't understand the so call USE statement I would not want to piss the guy off so not sure how to included it. Especially since it can even be faster with a few minor mods.

    Also its made at least in my mind for short files. Second your can't be sure how big a random or given general file will uncompress. It could uncompress to several Tera bytes. I haven't found one but I am sure anyone following the code could easily write such a file that would uncompress to such a size that it cause some sort of error. This was a common problem in the uncompress until I started using my stabiliy IO but that does not mean it catches everything. One could put a counter in so say if one GIG on decompression occurs it aborts or what ever you want you have the total source.

    I never tested bookstar or Tcup remeber if your testing witht the Bat file the *.4 are the compressed and the *.40 are the uncompressed resluts of the original file.

    *.1 The BWTS output
    *.2 The simple MTFQ output
    *.3 The simple RLEQ output
    *.4 The output of enropy compress arb25y

    If the input file "A..Z" and or "a..z" at this stage in the game
    the *.1 to *.3 file should all be visable treating file as pure Ascii
    with control caharacters its all missed up visually

    Take Care
    Dave

  4. #4
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts

    Red face

    I forgot to add the fact that even though its designed for small files. There are simple tweaks to tune it to large files. Also to make the Vista people happy the BWTS buffer size is 800k. The quickest way to improve on longer files besides the several obvious tweaking is to make the BWTS buffer bigger which means on most Vista Machines people will complain it doesn't run. You have the code it should not be that hard to change to allocate it with Malloc or New or whatever to alow for use of larger buffer. But I can't please everyone all the time.

    How big is bookstar before compression.
    How big is TUCP before compression.

    Assuming you use the bat files. Who big was bookstar after
    decompression and same with TUCP. Also when you compressed
    and uncompressed do the result match starting file.
    Also when you uncompressed and compressed assuming you
    tested that did the starting and ending files match?

    Thank You
    Dave

    P.S.
    Again this is test code it can be made much faster. It also has
    a large zero frequency problem that could be fixed by changing
    the way I did RLEQ I only did it the way I did so it would be easy
    to see RLEQ and UNRLEQ work. Obviously its not best which
    surprised me that it did better than M99 or RLE+M99 on the
    Clagary 18 file test. There is much room much of it obvious to
    make this code compress smaller and still be bijective.

  5. #5
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    The problem is that it seems to stall, I believe when you say that it works, but on BMP it's way to slow to be usable. PAQ8 seems like a rocket in comparison.
    I suggest to greatly reduce BWTS block size, then it should be OK.

    I modified the bat to remove unnecessary parts.

    Bookstar
    TCUP

  6. #6
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts

    Thumbs up

    Quote Originally Posted by m^2 View Post
    The problem is that it seems to stall, I believe when you say that it works, but on BMP it's way to slow to be usable. PAQ8 seems like a rocket in comparison.
    I suggest to greatly reduce BWTS block size, then it should be OK.

    I modified the bat to remove unnecessary parts.

    Bookstar
    TCUP
    Shortly after I wrote my last message I found info on Bookstar.
    Various people have various goals in writing compression mine is more theoretical as to what can be done. Mine is slow I think one person on this site has called me a bijective maniac its just what can be done not how fast for my taste computers will always get faster till the barbarians destroy with little science we have left. Like I said the MinGW complier seems especially prone to slow code to bad you can't use DGJPP in Vista. Also if anything I would use larger buffers for the BWTS. If you want to see better speed and compression fixs those two things. Also even in openbwt the guy used more or less intact my two comparsion routines there are far faster ways to do this. I suspect if BWTS becomes popular the speed will rival that of the current flavor of BWT.

    I am not going to go much farther with this current version. But I would like same small sample DNA files. I often think if writting code for it since I don't think there are good ones for that yet and I think my methods my work there and maybe speed is also not a problem.

    I also think at that bijective compression could be a good thing when compress and encrypting but hay thats just me. I also belive that if it can not be speeded up that would be another plus using it before a bijective encryption pass. But no one follows me and I try not to follow the herd.

    Take Care
    David Scott

    P.S.
    Thanks for testing it.

  7. #7
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by biject.bwts View Post
    Shortly after I wrote my last message I found info on Bookstar.
    Various people have various goals in writing compression mine is more theoretical as to what can be done. Mine is slow I think one person on this site has called me a bijective maniac its just what can be done not how fast for my taste computers will always get faster till the barbarians destroy with little science we have left. Like I said the MinGW complier seems especially prone to slow code to bad you can't use DGJPP in Vista. Also if anything I would use larger buffers for the BWTS. If you want to see better speed and compression fixs those two things. Also even in openbwt the guy used more or less intact my two comparsion routines there are far faster ways to do this. I suspect if BWTS becomes popular the speed will rival that of the current flavor of BWT.

    I am not going to go much farther with this current version. But I would like same small sample DNA files. I often think if writting code for it since I don't think there are good ones for that yet and I think my methods my work there and maybe speed is also not a problem.

    I also think at that bijective compression could be a good thing when compress and encrypting but hay thats just me. I also belive that if it can not be speeded up that would be another plus using it before a bijective encryption pass. But no one follows me and I try not to follow the herd.

    Take Care
    David Scott

    P.S.
    Thanks for testing it.
    What's pessimistic BWTS complexity parametrized by block size?
    Now it practically doesn't work on BMP. Check out yourself.

    I'm testing with OPENBWT BWTS, it's OK, though not fully bijective (saves block size). And quite strong.

  8. #8
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    With OpenBWT BWTS using 64 MB block:

    BMP: 22505537, ~220s
    Bookstar: 10161700, ~140s
    TCUP: 114062119, ~1000s

  9. #9
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    I made a Visual Studio compilation of arb25y.
    On bookstar it cut coding time from ~116 to ~106s.

    How making a bijective version of FastAC?

    ADDED: As usually I forgot the attachment.
    Attached Files Attached Files
    Last edited by m^2; 23rd January 2009 at 22:13.

  10. #10
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    There is a lot of waste and overlap in my code since its development code. For one thing I treated this sort of like 3 seperate 256 symbol coders. Yet this is not what I am doing I really have 256 symbols for 0 set then add only eight cells for the 1 set ( though I waste times and space by copying between the sets that's doing it 3 times to much work. I am using over
    700 binary cells but only needed less then 270. Also the model cost is higher than it should be. ( which results in longer compressed files. I sure the first thing one notices is that its foolish to use such large bit symbol for what is essentially a binary count. But I opted for ease of viewing the mtf and rle phase. If one gets serious even using a method like this I would do those far different but this gaves a flavor of what the other would be like. Also has to play games becasue of the zero frequency problem. Anyway it was fun like I said I was shocked it beat M99 and RLE-M99 since those are designed by people who should know more than me and mine at this point is just a toy.


    There is no need to save the BLOCK SIZE its wasted space as far as I am concerned. If you run it with say 64Meg blocks. Then you should automatically use how many ever 64meg blocks to cover the file. Of course the last block will automatically be what is left and usually much shorter.

    I hope that you check to see if files uncompress back to starting file. Sadly if you include the Block Size you can't test if its fully bijective. Any way glad your looking at it.

    I was thinking of add a parts option since files size before and after BWTS are exactly the same. You could make options like -s3 meaning break file into 3 segments. Or even stuff like -s3 -10M meaing use 3 segments if each less than 10M if not use 10M blocks till on 3 blocks left then break into 3 segments.
    0r options like -ss5 -10M-sse6 which would be an attempt to fix beginning block size and middle block size and trailing.

    i thought about writting an archiver of course it would not be bijective but if one does like PAQ and stores the uncompressed length of each file one could
    use the info in making sure either you don't to do group file together in the BWTS transform phase. If I use the current MTFQ and RLEQ the those stages could be done together in single passes. The ARB25Y pase could again be used as a single pass or multiple resets I may do this some do. The archive would have a sort of automatic test since the uncompressed lengths in header if uncompress short you have an error in file. If it tries to uncompress to much you stop and right error code.

    Take Care

  11. #11
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by biject.bwts View Post
    There is a lot of waste and overlap in my code since its development code. For one thing I treated this sort of like 3 seperate 256 symbol coders. Yet this is not what I am doing I really have 256 symbols for 0 set then add only eight cells for the 1 set ( though I waste times and space by copying between the sets that's doing it 3 times to much work. I am using over
    700 binary cells but only needed less then 270. Also the model cost is higher than it should be. ( which results in longer compressed files. I sure the first thing one notices is that its foolish to use such large bit symbol for what is essentially a binary count. But I opted for ease of viewing the mtf and rle phase. If one gets serious even using a method like this I would do those far different but this gaves a flavor of what the other would be like. Also has to play games becasue of the zero frequency problem. Anyway it was fun like I said I was shocked it beat M99 and RLE-M99 since those are designed by people who should know more than me and mine at this point is just a toy.


    There is no need to save the BLOCK SIZE its wasted space as far as I am concerned. If you run it with say 64Meg blocks. Then you should automatically use how many ever 64meg blocks to cover the file. Of course the last block will automatically be what is left and usually much shorter.

    I hope that you check to see if files uncompress back to starting file. Sadly if you include the Block Size you can't test if its fully bijective. Any way glad your looking at it.

    I was thinking of add a parts option since files size before and after BWTS are exactly the same. You could make options like -s3 meaning break file into 3 segments. Or even stuff like -s3 -10M meaing use 3 segments if each less than 10M if not use 10M blocks till on 3 blocks left then break into 3 segments.
    0r options like -ss5 -10M-sse6 which would be an attempt to fix beginning block size and middle block size and trailing.

    i thought about writting an archiver of course it would not be bijective but if one does like PAQ and stores the uncompressed length of each file one could
    use the info in making sure either you don't to do group file together in the BWTS transform phase. If I use the current MTFQ and RLEQ the those stages could be done together in single passes. The ARB25Y pase could again be used as a single pass or multiple resets I may do this some do. The archive would have a sort of automatic test since the uncompressed lengths in header if uncompress short you have an error in file. If it tries to uncompress to much you stop and right error code.

    Take Care
    Yeah, I know that block size is easily removable, but as it's just a test, I didn't care to recompile.

  12. #12
    Programmer michael maniscalco's Avatar
    Join Date
    Apr 2007
    Location
    Boston, Massachusetts, USA
    Posts
    114
    Thanks
    11
    Thanked 88 Times in 26 Posts
    Quote Originally Posted by biject.bwts View Post
    Anyway it was fun like I said I was shocked it beat M99 and RLE-M99 since those are designed by people who should know more than me and mine at this point is just a toy.
    Be fair now, David. You're comparing your results to a version that is almost a decade old!

    You might want to compare to 2.2 which has an option for fast mode vs max mode (bit packing only vs. using range coder) and an option to enable/disable RLE for a more modern comparison.

    http://www.michael-maniscalco.com/do...s/m99.v2.2.zip

    Actually, I'm shocked that this old website was still around after so many years. I guess geocities doesn't delete unmaintained sites at all.

    - Michael

  13. #13
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    Quote Originally Posted by michael maniscalco View Post
    Be fair now, David. You're comparing your results to a version that is almost a decade old!

    You might want to compare to 2.2 which has an option for fast mode vs max mode (bit packing only vs. using range coder) and an option to enable/disable RLE for a more modern comparison.

    http://www.michael-maniscalco.com/do...s/m99.v2.2.zip

    Actually, I'm shocked that this old website was still around after so many years. I guess geocities doesn't delete unmaintained sites at all.

    - Michael
    OK I will try to be fair but when one looks up on Google for compression of the 18 file Calgary with BWT I thought that this was about the only one I could find that shows the total bytes for the 18 files Calgary as well as a table showing compression of each file. It surprised me that other sites that talk about the Calgary for one thing only used only a subset of the 18.
    Do you know where on the net there is an equivalent table with your M99.v2.2 I did a google search today for "m99.v2.2 calgary" nothing came back. I could make such a table but I don't think that would not be fair!
    I think alot of current BWT research is done only to write papers and is not available to the common man. Even when I did a review of a paper for the IEEE which I am not a member I was not given access to the code that actually made the tables the man running the show said thats how reviews done today. I write a binary version that compared my binary BWT to a paper where some guy wrote a binary BWT but gave up trying to actually find code. The only code I have compared my BWTS with before was Mark's his was the first and I suspect it what most people compare against.
    I don't think Using 8 bit 256 symbols all the way to last stage is the best idea but it was fun doing it. If you publish the equivlent table for M99.V2.2 be sure to run against same exact file set with the options all the same OK.
    If you don't want to do that give me at least the exact command line option that you would use. I would hate to waste time chasing a goal to only have you say you should have used m99 -x3 -yws option please you be fair.

    Thanks
    David Scott

  14. #14
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    Don't get me wrong my goal here is to more or less prove that BWTS is usually better than the old style BWT where there is a wasted expansion of the data during the sorting pass. I was hoping people who try to squeeze the last byte out would use BWTS. I hope that you give it a shot when you recompile M99 V2.2 just to see what it does! Sadley the zip file you reference only contains an exe. So it hard to say what you used maybe it already updated and using a BWTS in stead of an old BWT.

    But I still would like you to produce a similar table since you did it for M99 with and with RLE for the orignal 18 files so a fair test can be done.

    Thanks
    David Scott

  15. #15
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts

    How to use M99V2.2

    michael maniscalco mentioned I should instead test against what he said was version 2.2 well I downloaded the file it was v2.2.1 For some reason he never stated what it gets for the 18 file Calagary. You can find M99 and RLE+M99 version 1 I guess on the internet. But after asking me to test against the one he recommends its not at ll clear to me what I should be aiming for. He gives no clue as to what options one should use. I took a blind stab at many options his source code not there. But the best I got when his code ran was 839,149 yet I am not sure that is the options he thinks is best. Does anyone have a clue as to what options should be used so that a comparasions of BWT compressors could be made. I would like to compare the BWTS to a BWT version but trying to test against what he suggests is kind of hard when one does not know what his code does against the 18 file Clagary. If anyone knows of or if micheal can state what the options are for the totals are I would have something to play with. He did this with the old code is there a reason he doesn't wish to compare the newer version to the older?

  16. #16
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts

    The "fair test" of Micheal's

    I don't think if it as a totally fair test but its close.

    Micheal allowed me to down load M99 V2.3 I guess his latest.
    it has a -s options which turns off the BWT. Which means it
    excepts a BWTed file. Not sure how to use it to actually get
    close to how it compresses with the internal BWT since you
    must handle an INDEX and BWTS does not have one.
    I tested his code not sure if I have the right one he thought it
    should compress to about 739,000 which is not even close using
    the options he suggested.

    Working with what he gave me his code compressed the Calgary
    18 file thing smaller that mine every time. But by less than 200 bytes
    the best with his internal BWT was 839,149 The best with my external
    BWTS was 839,313 Not sure What best external BWT since its not clear
    how the code handles an real BWT and the index. I guess one reason it
    does mine poorer is that the INDEX thing or TERMINATION thing. But at
    least I have a new goal to shoot for in my own BWTS compressor thing.

    So think you Micheal

    David

  17. #17
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts

    The "fairer test" of Micheal's

    I down loaded Micheal BWT Msu...3.1.1 and used it as the front end to his M99 V2.3 -s option. The best it got for all 18 files was 839,324 versus using BWTS in front 839,313. I like that!!

    Basically it shows BWTS as good as OLD BWT that carries an index. The advantage is that with old style BWT plus index it will be hard to match the best PPM compressors since they are easy to be bijective and OLD style BWT are not.
    As speeds improve why use a BWT when a BWTS does not expand seems to compress as well. Also has the chance of being in a compression program that matches or beats bijective PPM.

    Take Care
    David Scott

  18. #18
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts

    More info from Micheal

    Not sure how much farther I will go. I think I have shown BWTS is as good as BWT. Found out from Micheal that he made a typo his latest version does compress around 839,000 bytes for the 18 files not the 739,000 so I was using his code correctly. There is also some overhead difference when using the -s option but that seems to be minor. It does seem to work.
    Take Care
    David

  19. #19
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts

    Hopefully easier to use full BWTS bijective compressor

    What I really did was used the faster and better BWTS routines and
    created simpler compress.bat and uncompress.bat files and complied
    with MinGW with minor changes in the files to prevent compile errors.
    I have not tested many files. But it seem to work each stage
    is Bijective. If you have any problems let me know. It uses a single
    block for the BWTS if you have memory problems it will fail.
    It might fail anyway.

    All the source code is included for anyone wishing to play
    with bijective compression.
    Attached Files Attached Files

Similar Threads

  1. USING BWTS in BWTmix
    By biject.bwts in forum Data Compression
    Replies: 8
    Last Post: 9th March 2010, 03:00
  2. BWTS STATUS OF PAPER
    By biject.bwts in forum Data Compression
    Replies: 20
    Last Post: 4th September 2009, 22:10
  3. Which compressor for general use
    By Fairy in forum Data Compression
    Replies: 14
    Last Post: 9th August 2009, 19:19
  4. BWTS explanation?
    By TopQuark in forum Data Compression
    Replies: 5
    Last Post: 8th April 2009, 23:26
  5. NEW FULL BWTS COMPRESSOR
    By biject.bwts in forum Data Compression
    Replies: 2
    Last Post: 20th January 2009, 21:10

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •