For BWT there's an option to apply bitwise arithmetic coding to an already compressed bitcode.
As example, bsc/qlfc achieves 4x bit count reduction already by converting symbols to ranks/RLE.
Yes
No
For BWT there's an option to apply bitwise arithmetic coding to an already compressed bitcode.
As example, bsc/qlfc achieves 4x bit count reduction already by converting symbols to ranks/RLE.
The nature of BWT data having lots of similar symbols adjacent to each other lends itself well to a bytewise arith coder that shuffles the order of symbols based on what's recently been seen. That ought to make the search faster. Either that or SIMD CDF update functions.
A pure o1 range coder (not ANS) on a 1GB pre-BWT-ed copy of enwik9 I get 180125672 bytes at 57MB/s enc and 52MB/s dec (no threads).
With built-in RLE+o1 joint coder it's 174962203 bytes at 62MB/s encode and 57MB/s decode.
Add in LZP and it's probably a little smaller / faster.
This isn't using any clever CDF stuff. What sort of speed are you wanting for the coder side of things? Static 128Kb blocks with RLE + rANS order-1 gets me 190634138, which is poor, but at ~150MB/s encode speed and 350MB/s decode speed. Not great ratio and static frequencies aren't good for BWT. Maybe some delayed update so quasi-adaptive is a good middle ground.
I guess it could be interesting to experiment with nibble alphabets if its non-binary.
Like encode low-nibble+escape + high-nibble-rank only after escape.
Nibble CDF update can simulate bitwise model and nibble rank update can be done with a single vector instruction.
Okay, the FINAL version of the BCMv1 is here:
https://github.com/encode84/bcm
It is fully compatible with previous versions.
My next BWT-based file compressor will be either BCMv2 or a completely new one (might be SQUID - since it's originally based on a fast byte-wise arith coder)
![]()
BTW, results of my byte-wise o1 on enwik9.bwt:
It's close to 100 MB/sec!Code:Comp. size = 180372557 Elapsed time = 11.205s![]()
JamesB (7th February 2020)
I'm thinking to test latest BCM for my benchmark [1] (where currently v.1.30 participates). Therefore I have a question (and a feature request).
Does BCM still only work with files? If so, then I'd like to request adding support for using stdio/stdout (streaming mode) for uncompressed data.
In my benchmark I use all compressors only in streaming mode (reading from stdin during compression, writing to stdout during decompression). This streaming mode is important for many practical applications, where multiple tools pipe data to each other. I want my results to be relevant for such applications. Therefore, for compressors without such streaming mode, I bolt it on it via a wrapper script. A wrapper decompresses data into temporary file, then streams this file to stdout, and the TOTAL time is measured and compared with other compressors. Naturally, if a compressor can output to stdout by itself, the wrapper and temporary file is not needed and the true speed can be seen.
[1] http://kirr.dyndns.org/sequence-compression-benchmark/