http://ctxmodel.net/files/mix_test/BWTmix_v0.rar
BWTmix v0 = BWT+o1rc+unBWT from mix_test_v8
Usage:
BWTmix c book1 book1.ari -- compress
BWTmix d book1.ari book1.unp -- decompress
BWTmix c7 book1 book1.ari -- compress book1 with block size of 7*100000=700k
Code:
comp.size enc.time dec.time
20621695 177.453s 102.437s // enwik8
20744613 46.000s 35.468s // enwik8 + bcm008
167978527 2104.094s 1019.938s // enwik9
Times are measured with Q9450 @ 3.52Ghz = 440x8, ramdrive
Well, I guess my qsort is slow comparing to a proper BWT sort 
And as to CM, its not really that hopeless, and could be probably
optimized to the same speed as bcm008 - like with vectorized mixing.
Also, this is open source, and bcm is not