I wrote a new open source (GPL) bytewise LZP compressor called flzp,
http://cs.fit.edu/~mmahoney/compression/text.html#4975
It is fast but compression isn't very impressive. It is mainly interesting because it can be used as a preprocessor to another low-order compressor and improve the compression and speed over either one alone, for example:
57,366,279 enwik8.flzp 8 sec (2.2 GHz Athlon-64, WinXP Home)
63,391,013 enwik8.fpaq0 36 sec
39,879,072 enwik8.flzp.fpaq0 8+21 sec
It also improves compression for ppmd -o2 or -o3 (but not higher orders).
It uses a simple algorithm. It divides the input into 64K blocks and any byte that doesn't appear in the block is used to code match lengths into a rotating 4MB buffer using a 1M hash table for an index. Literals are not compressed. The hash table is indexed by an order-4 context hash. Any unused byte is available to code matches, but if there are less than 32 then the block size is reduced. It doesn't need to use escape codes. In the worst case, a 223 byte block with each byte different would be expanded by 33 bytes (a 32 byte bitmap header showing available codes plus an end of block symbol).
Speed is asymmetric. It decompresses twice as fast as it compresses because the compressor has to make 2 passes over each block. The first pass is to get a list of available bytes and determine the block size.
I might add a low order context model to make a fast program with decent compression. Or maybe I will modify it to only code long matches as a preprocessor to a BWT compressor (to speed sorting) or CM (to reduce input size for speed). Currently it codes matches of length 2 or more as 1 byte.