What is the current state of the art for BWT computation? I realize this is nearly the same as asking what the fastest string sorting routine is.

There's several good references online, BZIP2's page in fact talks about it and Julian himself wrote a paper on BWT sorting speed. But that result is many years old now.

Two things I'm especially interested in: what's the current speed of BWT for say a standard corpus.. is is something on the order of 2MB/sec or 200 MB/sec?

Note that this is just the BWT speed, I could run BZIP2 or whatever to measure an overall compression speed but that doesn't isolate the BWT contribution.

Second, has anyone worked on parallelizing BWT? Sure, it can be parallelized by giving each thread its own block, but I mean having multiple threads work on the same block.

Thanks for any pointers/hints! There's a lot of data out there but it's hard to see where the modern speed & efficiency is at.