I'm not really sure what kind of algo you assume there,
but compressors in general look like this:
Code:
do {
(buf,len) = read_input( bufsize );
(buf2,len2) = process_data( buf,len );
write_output( buf2, len2 );
} while( len==bufsize );
Of course, quite a few people use getc/putc all over the program
instead, but that's not worth talking about, it can be optimized
to buffered i/o and memory processing without any platform-specific methods.
But then, there's a problem of blocking.
That is, read_input() and write_output() waste some time even
though OS normally only has to copy some memory buffers there
(with some common sense buffer sizes which was discussed before).
So its possible to improve the processing speed by handling the i/o
in other thread(s), along with (de)compression.
Like that, we can read more data into a new buffer while processing
the old buffer in other thread, etc.
And implementing that basically requires only one platform-specific
function - CreateThread or similar. Well, also Sleep or SuspendThread,
but that won't improve the processing speed on multicore system, though
would reduce the load.
Thus, its easy to write a multiplatform wrapper function for that
and make a portable compressor with async i/o.
Now, considering "completion ports", its basically a windows-specific
library with complex enough API for efficient implementation of such
processing routines. And its only noticeable benefit is that it removes
the necessity to do something about i/o threads when they don't have
work, which would be 1-2 calls per data block.
In fact, its quite believable that while handling 1000 streams at once
that would be a noticeable improvement (though likely not large anyway).
But in the specific case with Cyan's compressor the i/o threads likely would
be fully loaded and the processing thread would have to wait instead.
So "completion ports" won't do anything except making the program
hard to port.
Cyan: also note that it makes sense to do fopens asynchronously
as well - especially the one for reading.
Its a bit hard to invent a job for the main thread which it can do
before having access to input data, but still, at least stuff like
init of statistical model, lookup table precalculation etc, can be
done there. Probably, preallocation of large buffers can be placed
there too.