Today, 02:41
Yes, totally.
I wouldn't bother with that kind of optimization.
Yes.
The underlying concept is that the baseline LZ4 implementation in lz4.c
can be made malloc-less.
The only need is some workspace for LZ4 compression context,
and even that one can be allocated on stack, or allocated externally.
It's also possible to redirect the few LZ4_malloc() invocations to externally defined functions : https://github.com/lz4/lz4/blob/dev/lib/lz4.c#L190
Yes, that's correct.
Yes.
All that matters is that the state is correctly initialized at least once.
There are many ways to do that, LZ4_compress_fast_extState() is one of them.
LZ4_initStream() is another one.
memset() the area should also work fine.
Finally, creating the state with LZ4_createStream() guarantees that it's correctly initialized from the get go.
I don't remember the exact details.
To begin with, initialization is very fast, and it only makes sense to skip it for tiny inputs.
Moreover, I believe that skipping initialization can result in a subtle impact on branch prediction later on,
resulting in slower compression speed.
I _believe_ this issue has been mitigated in latest versions, but don't remember for sure.
Really, this would deserve to be benchmarked, to see if there is any benefit, or detriment, in avoiding initialization for larger inputs.
I don't see that comment.
Compression state LZ4_stream_t should be valid for both single-shot and streaming compression.
What matters is to not forget to fast-reset it before starting a new stream.
For single-shot, it's not necessary because the fast-reset is "embedded" into the one-shot compression function.
Yes, it is.
It's generally better (i.e. faster) to ensure that the new block and its history are contiguous (and in the right order).
Otherwise, decompression will work, but the critical loop becomes more complex, due to the need to determine in which buffer the copy pointer must start, control overflows conditions, etc.
So, basically, try to achieve contiguous ] and speed should be optimal.