There are many variations of LZ78. I think even ACB (
http://encode.su/threads/540-ACB ) could be considered a LZW variation. David Scott has made bijective LZW (although very slow).
I think it's better to invest time into BWT as it's scalable and preprocessing + transformation stage is almost fully decoupled from encoding stage, ie usually changes in first stage (eg changing from LZP to LZ77 preprocessing or switching between BWT and BWTS) does not influence much the performance of encoding stage. IMO any LZ coder with adaptive entropy coding is a messy thing, ie optimizing performance relies too much on tuning and trying ad-hoc combinations of constants as LZ has an inherent information overhead (by allowing multiple encodings of the same input string). On the other hand with static entropy coding it is sometimes feasible to do optimal parsing and then focus on improving speed, thus avoiding spending time on extensive tuning, which will be useless if a customer compresses files with different content than your data corpus.
Regarding LZW: I think it is possible to improve the compression, for example by keeping the dictionary sorted. Then you can predict the highest bits of indexes using statistics like in PPM. Additionally you can eg assume higher probability of longer dictionary entries.
Edit:
What about variable length LZW? Ie: usual LZW (from what I know) uses dictionary of size of a power of two (some entries are free during compression) and then uses an integer length bit code to refer to a particular entry. We can use something like prefix(less) codes (Huffman codes are examples of these) that have varying length. Entire dictionary could be organized into a binary tree, with entries in leafs. Each node would keep a probability of selecting left or right children. In fact such scheme would be a good hybrid between PPM and LZW, the advantage over plain PPM is that that hybrid could sometimes encode multiple symbols as one bit (ie before actual encoding) when some edge would correspond to multiple symbols. Did anyone tried such an algorithm? The ability to keep multisymbol edges could be a memory saver compared to PPM. Of course such hybrid would have the disadvantage of having mutiple encodings of the same input string, maybe unless David Scott makes a bijective version.