There is a lot of waste and overlap in my code since its development code. For one thing I treated this sort of like 3 seperate 256 symbol coders. Yet this is not what I am doing I really have 256 symbols for 0 set then add only eight cells for the 1 set ( though I waste times and space by copying between the sets that's doing it 3 times to much work. I am using over
700 binary cells but only needed less then 270. Also the model cost is higher than it should be. ( which results in longer compressed files. I sure the first thing one notices is that its foolish to use such large bit symbol for what is essentially a binary count. But I opted for ease of viewing the mtf and rle phase. If one gets serious even using a method like this I would do those far different but this gaves a flavor of what the other would be like. Also has to play games becasue of the zero frequency problem. Anyway it was fun like I said I was shocked it beat M99 and RLE-M99 since those are designed by people who should know more than me and mine at this point is just a toy.
There is no need to save the BLOCK SIZE its wasted space as far as I am concerned. If you run it with say 64Meg blocks. Then you should automatically use how many ever 64meg blocks to cover the file. Of course the last block will automatically be what is left and usually much shorter.
I hope that you check to see if files uncompress back to starting file. Sadly if you include the Block Size you can't test if its fully bijective. Any way glad your looking at it.
I was thinking of add a parts option since files size before and after BWTS are exactly the same. You could make options like -s3 meaning break file into 3 segments. Or even stuff like -s3 -10M meaing use 3 segments if each less than 10M if not use 10M blocks till on 3 blocks left then break into 3 segments.
0r options like -ss5 -10M-sse6 which would be an attempt to fix beginning block size and middle block size and trailing.
i thought about writting an archiver of course it would not be bijective but if one does like PAQ and stores the uncompressed length of each file one could
use the info in making sure either you don't to do group file together in the BWTS transform phase. If I use the current MTFQ and RLEQ the those stages could be done together in single passes. The ARB25Y pase could again be used as a single pass or multiple resets I may do this some do. The archive would have a sort of automatic test since the uncompressed lengths in header if uncompress short you have an error in file. If it tries to uncompress to much you stop and right error code.
Take Care