Some time ago I was writing about a problem for reversed entropy coding (thread): encoding any bit sequence as a sequence of symbols having some chosen statistics, such that only the sender knows the probability distributions - for example to encode as a sequence resembling some common data for various steganographic/watermarking purposes, like QR codes resembling a picture (grayness becomes probability of using 1 in the corresponding pixel of the code - ICIP2014 paper).
If decoder knows the probability distributions, it can be easily done by using entropy coder with switched encoder and decoder.
If decoder doesn't know, it is more difficult, but the same rates can be approached (generalization of the Kuznetsov-Tsybakov problem).
The interesting question for data compression is if we can perform standard entropy coding with probability distributions unknown to the encoder? (known only to decoder)
In other words, can we remove the statistical modelling from the compressor? (decoder still has to do it)
Thanks of recently working on JRC, I have just realized that it can be done!
So JRC (Joint Reconstruction Codes, thread) allows to perform error correction, such that in contrast to standard error correction, the encoder doesn't need to know damage levels (to choose redundancy levels). For example for broadcasting or storage media problem - while the encoder has to use universal packets, each packet (receiver) can have a different noise level.
In contrast to standard error correction, JRC decoder uses damage levels as individual trust levels for each packet - it tries to guess the original message to get agreement with all the received packets, tolerating some disagreement depending on the noise level of each packet.
It assumes that message bits were from uniform {0,1} probability distribution - we can put the statistical model there instead! (to restrict the space of interesting messages and so the required information)
So how to get entropy coding without the encoder's knowledge of the probabilities...
Assume the sequence is from a statistical model (iid, Markov, etc.) with entropy H bits/symbol - a N symbol sequence can be transformed into asymptotically length NH bit sequence.
Directly storing these symbol would take lg(m) bits/symbol instead, where m is the alphabet size, lg is base 2 logarithm.
In other words, while the number of all such sequences grows like m^N = 2^(N lg(m)), the statistical model restricts this growth to asymptotically 2^(NH) - we can apply this restriction on the decoder side.
So finally, the encoder should produce more than NH bits to allow the decoder to choose one of 2^(NH) messages in agreement with the statistical model.
These >NH bits can be exactly like for JRC: uniformly distributed checksum bits which depends on the entire history.
Now the decoder needs to search the space of possible 2^(NH) messages - it can be done by sequential decoding as in the current implementation of JRC: by just adding the statistical model to probabilities (weights) of candidates to consider.
I can modify JRC implementation if there would be some real applications?
One application could be making cheaper encoders e.g. for sensory networks (however, decoding would become much more expensive).
Another can be distributied source coding type of applications - imagine spreading sensors locally. Individually they have poor statistics to adapt to varying conditions, hence it might be beneficial to shift the need of knowing the statistical model to the decoder, which have information from all of them.
What real-life applications of such compression without statistical modeling could you think of?