I guess it was provocative enough to get an answer 
> > as its speed is bound by i/o anyway.
>
> In kzip? It definitely isn't and there's a lot of room for improvement.
Decoding speed is the same for kzip-generated archivers and any others.
And here's a quick test with a directory containing sources and executables
of some random program (its a slow cpu, i know):
Code:
4161950 unpacked
1570036 0.563s // infozip 2.3
1565043 0.844s // infozip 2.3 -9
1533995 1.188s // 7z 4.64 a -tzip
1509658 7.937s // 7z 4.64 a -tzip -mx9
1509789 48.203s // kzip
And now, who would care about 4% if its a zip archive anyway?
And reaching hdd speed shouldn't be a problem even with some
simple match sequence optimization.
Anyway, deflate is not the kind of algorithm where tweaking
would bring a lot of compression improvement.
> > > and maybe multithreaded implementations as well.
> > Well, 7-zip has that I think.
> > But anyway, multithreaded implementations are not easily portable.
> OpenMP is pretty portable.
Well, I'd really like it if somebody just gave me a working gcc 4.2+ for iphone.
But I actually talked about a somewhat different kind of portability.
Not about whether a multi-threaded compression library would work at all,
but about _how_ it would work.
Most applications are still not that thread-aware, and ones that are
mostly have some dirty hacks - so quirks after attaching a multi-threaded
compression library are kinda guaranteed.
Though actual problem is still something else - thing is, any speed gain
from multi-threaded implementations is only possible in certain circumstances,
basically when you can process a lot of data at once. And for something
like decoding a 20k web page it would be probably twice slower than
simple single-threaded version (and that's not considering other things).
> > BWT/CM won't be ever faster than LZ decoding, multi-threaded on not.
> Didn't you say the opposite recently?
Its not really a contradiction 
> That one could make CM with LZMA decoding speed and better compression?
I said that I think it should be possible to make a _new_ CM with
faster decoding than _current_ LZMA, and somewhat better compression.
And here we're talking about deflate, based on static bitcode, which
I specifically mentioned.