Well, I couldn't stop ;)
http://ctxmodel.net/files/rcbwt_v2.rar
- Added frequency rounding to 14 bits (to avoid bitcode overflows)
- Chunk size is now dynamic
Code:
Input file size = 100000000 (enwik8)
Memory usage:
- o1 static bitcode model = 34736128
- clustering statistics = 4194304
- BWT index table = 5652984
- source data buffer = 53711205
TOTAL = 98294621
Input file size = 1000000000 (enwik9)
Memory usage:
- o1 static bitcode model = 34736128
- clustering statistics = 4194304
- BWT index table = 68491184
- source data buffer = 538268254
TOTAL = 645689870
And here, with enwik9, we finally have an example of BWT with 0.646N memory usage ;)
It still works though, and I don't think I have anything to verify that its correct.
Would later try to compress the output with o2rc and see what'd happen.