People have observed that the BWT is similar to the Fourier Transform (aka FFT, or, really, the DFT. I'll call it the DFT). I didn't know much about the DFT when I came across that statement, so I wasn't sure how deep the analogy went. It could resemble it only superficially, or there could be deep connections between them. Given that the DFT is based on very old math, and it has gotten lots of study, there could be a lot of insight in finding connections between it and the BWT, and might point the way to other kinds of transforms with uses in compression, and efficient ways to implement them.
I've been looking into the DFT lately, and I didn't think so at first, but now I think the BWT can be interpreted as being a transform in the same family as the DFT.
You can interpret data such as text as the coefficient vector of a polynomial with coefficients over a finite field mod 256. Alternately, you can interpret it as the point values of such a polynomial. The polynomial interpretation makes it applicable to finite field versions of the DFT. Text isn't made of sine waves -- at least, not in any way I know of -- so the usual sine waves have no advantage as basis functions, but lots of functions can be used, as long as they have specific properties. I've noticed that the functions corresponding to different "frequencies" are basically the same coefficients under different permutations (this isn't always 100% true, but it's true when the number of samples is prime). The coefficient vector itself can be interpreted as a finite ring or field, where the elements are the set of polynomials that are rotations. Enumerating the elements of this ring in lexicographic order gives the same order as the BWT (specifying it that way gets away from sorting, which doesn't fit in to the DFT). The symbols corresponding to the BWT can defined to be remainders from polynomial division.
I haven't worked out the all the details of how to formally make the BWT into a Fourier transform, but there seems to be a lot to go on, and there is probably at least one way to do it. Interpreting text as polynomials can be a useful way to think about compression algorithms, with connections to ring theory and field theory. For example, the way LZ77 breaks down text into duplicated fragments is related to factoring a polynomial. There may be more systematic ways of factoring that come out of existing theory.
This is something I've been thinking about. I'll use this thread to post whatever interesting things I find.
Edit: I realized that if you enumerate the rotated polynomials in order, that's pretty much all of it. Just drop all but the last term and it's done. So that trivializes it. Any way you look at it, the BWT just permutes, it doesn't mix together all the terms like the DFT, so it's simpler. I think there's fertile ground in the space in between them, though, and a good place to look for ideas.