I am supposed to work as consultant helping with design of data compressors, what motivated me to take a closer look and rethink - after directly predicting parameters e.g. of Laplace distribution from context, optimizing color transform, it is time to take a closer look at DCT coefficients - some first observations: https://arxiv.org/pdf/2007.12055
1) I have checked DST and DCT family and DCT-II is indeed the best, trials of further optimization gave barely a few bits per 8x8 improvement,
2) There is a "common knowledge" that AC coefficients are from Laplace distribution, I have taught students it ... but looking at data this is just wrong!
Checking exponential power distribution (EPD) family rho ~ exp(-|x|^k), we don't get k=1 of Laplace distribution, but k~0.5 which is quite different: more compact body, thicker tails.
Using Laplace distribution for values from k=0.5 EPD, we are wasting ~0.1 bits/value ... what for RGB without subsampling can mean ~0.3 bpp file reduction.
Here is such evaluation, at the bottom perfect agreement would be flat, while Laplace is the orange one:
The question is where directly is used Laplace assumption for AC coefficients? They might have this improvement capability ...
One is Golomb code which is "optimal": for some Laplace distributions, among prefix codes ... do anybody still use it in modern image/video compressors?
I remember discussing with Jean-Marc Valin about PVQ - I have shown how to optimize it for uniform distribution on sphere perfect for multivariate Gaussian distribution ... but he said they are from Laplace distribution for what the basic PVQ turned out better ... if it is k~1/2 instead, we again should be able to reduce MSE ~10-20% by such deformation.
3) I have also worked on flexible quantization for such densities - automatically get a better one than uniform, avoiding costly standard techniques like Lloyd-Max.
Are there some more practical techniques?
I have defined "quantization density" q choosing how dense is local quantization, kind of N->infinity limit of number of quantization regions, to finally be used for finite N.
Taking its inverse CDF on some size N regular lattice we get (nonuniform) quantization, e.g.:
This q/Q quantization density can be optimized for chosen density, above is for minimizing MSE distortion, for which as shown we should choose
q ~ rho^{1/3}
so we should use e.g. twice denser quantization for 8 times denser regions.
However, such distortion optimization increases entropy due to more uniform distribution - what turns out a serious problem also concerning e.g. Lloyd-Max.
I have also worked on such rate-distortion optimization, and it leads to nearly uniform quantization - so not so bad after all.
PS. I had these 3 points previously, but couldn't avoid this "internal server error" (is there a way?) ... and there everything crashed, and generally is super slow ...