There is a nice old concept for fractal compression of pictures: trying to find self-similarities and utilize them, for example encoding the mappings and differences:
https://en.wikipedia.org/wiki/Fractal_compression
It didn’t seem to be successful, however maybe analogous approach could be reasonable for text or binary file compression?
Specifically,
while standard LZ schemes require perfect match for a sequence, there should be much more sequences with a nearly perfect match – it might be beneficial to
extend LZ scheme to allow also for imperfect match and then encode the differences: for example byte sequence to XOR with, with dominating zeros – such difference could be cheaply stored using entropy coder.
The problem is that
finding similar sequences (by encoder) seems very costly.
This is a very common task in DNA alignment and a few days ago I have realized that such search can be done in a relatively cheap way:
http://arxiv.org/pdf/1602.05889
Specifically, rate distortion theory allows to find
Distortion-Resistant Hashing (DRH):
hash values which are not changed while small distortion of object (like sequence).
So searching for similar sequences can be reduced to having the same DRH value, and such collisions are relatively cheap to find.
Does LZ scheme which (additionally!) allows partial match and then coding of differences seem reasonable? Have you met with such approaches?
What other applications of distortion-resistant hashing can you think of?