Results 1 to 11 of 11

Thread: Improving distribution and quantization for DCT coeffcients

  1. #1
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    803
    Thanks
    244
    Thanked 255 Times in 159 Posts

    Improving distribution and quantization for DCT coeffcients

    I am supposed to work as consultant helping with design of data compressors, what motivated me to take a closer look and rethink - after directly predicting parameters e.g. of Laplace distribution from context, optimizing color transform, it is time to take a closer look at DCT coefficients - some first observations: https://arxiv.org/pdf/2007.12055

    1) I have checked DST and DCT family and DCT-II is indeed the best, trials of further optimization gave barely a few bits per 8x8 improvement,

    2) There is a "common knowledge" that AC coefficients are from Laplace distribution, I have taught students it ... but looking at data this is just wrong!
    Checking exponential power distribution (EPD) family rho ~ exp(-|x|^k), we don't get k=1 of Laplace distribution, but k~0.5 which is quite different: more compact body, thicker tails.
    Using Laplace distribution for values from k=0.5 EPD, we are wasting ~0.1 bits/value ... what for RGB without subsampling can mean ~0.3 bpp file reduction.
    Here is such evaluation, at the bottom perfect agreement would be flat, while Laplace is the orange one:
    Click image for larger version. 

Name:	MRnmjsG.png 
Views:	74 
Size:	554.7 KB 
ID:	7809
    The question is where directly is used Laplace assumption for AC coefficients? They might have this improvement capability ...
    One is Golomb code which is "optimal": for some Laplace distributions, among prefix codes ... do anybody still use it in modern image/video compressors?
    I remember discussing with Jean-Marc Valin about PVQ - I have shown how to optimize it for uniform distribution on sphere perfect for multivariate Gaussian distribution ... but he said they are from Laplace distribution for what the basic PVQ turned out better ... if it is k~1/2 instead, we again should be able to reduce MSE ~10-20% by such deformation.

    3) I have also worked on flexible quantization for such densities - automatically get a better one than uniform, avoiding costly standard techniques like Lloyd-Max.
    Are there some more practical techniques?
    I have defined "quantization density" q choosing how dense is local quantization, kind of N->infinity limit of number of quantization regions, to finally be used for finite N.
    Taking its inverse CDF on some size N regular lattice we get (nonuniform) quantization, e.g.:
    Click image for larger version. 

Name:	lbTp0nd.png 
Views:	36 
Size:	138.0 KB 
ID:	7810

    This q/Q quantization density can be optimized for chosen density, above is for minimizing MSE distortion, for which as shown we should choose
    q ~ rho^{1/3}
    so we should use e.g. twice denser quantization for 8 times denser regions.

    However, such distortion optimization increases entropy due to more uniform distribution - what turns out a serious problem also concerning e.g. Lloyd-Max.
    I have also worked on such rate-distortion optimization, and it leads to nearly uniform quantization - so not so bad after all.

    PS. I had these 3 points previously, but couldn't avoid this "internal server error" (is there a way?) ... and there everything crashed, and generally is super slow ...
    Last edited by Jarek; 24th July 2020 at 08:27.

  2. Thanks:

    Jyrki Alakuijala (25th July 2020)

  3. #2
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    803
    Thanks
    244
    Thanked 255 Times in 159 Posts
    I have just updated https://arxiv.org/pdf/2007.12055 by exploiting statistical dependencies inside 8x8 DCT block - turns out it allows for huge savings: ~0.5 bits/value (per pixel for grayscale, up to x3 for RGB).
    So DCT decorrelates well - additional linear(!) predictions between its coefficients make nearly no sense.
    However, there are also different statistical dependencies between them, like width dependence (between variances): the higher absolute value of one coefficients, the larger width of Laplace should we use for its neighbors.
    It it is shown in the left plots below: the higher already decoded |DCT_12|, the larger width of Laplace should we use for DCT_13 - exploiting of which gives ~0.4 bit/value savings.
    Bottom right contains weighs for finding optimal width from already decoded in zigazg order - it becomes a bit costly, but inexpensive approximations should be close.
    If e.g. JPEG recompressors don't use it, expect finally breaking their 20-22% bound by a lot ...


  4. #3
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    803
    Thanks
    244
    Thanked 255 Times in 159 Posts
    Updated https://arxiv.org/pdf/2007.12055 with DCT prediction from already decoded neighboring blocks - both to improve compression ratio and to reduce blocking artifacts.
    We can use DCT coefficients (left) of these decoded blocks, or their final decoded values (right) - here are visualized coefficients for both (and bits/value savings for various approximations), as expected it sufficient to focus on column/row of adjacent pixels:




    For compromise between compression ration and computational cost, I would perform 1D DCT of these right-most columns and lowest rows of decoded blocks - to use as features for predictions for new blocks as there are frequency dependencies (Figure 7).
    And generally mean savings from such predictions is ~0.05 bits/value, what is tiny comparing to savings from width prediction (of Laplace distribution) ~0.5 bits/value - the latter seems rarely used (LOCO-I ... ?) what would mean huge savings opportunity.

    Is something like above used in practice?
    Could anybody summarize methods used in JPEG repackers (PACKjpg, lepton, brunsli)?

  5. Thanks:

    MegaByte (27th August 2020)

  6. #4
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    876
    Thanks
    242
    Thanked 325 Times in 198 Posts
    Quote Originally Posted by Jarek View Post
    ... I would perform 1D DCT of these right-most columns and lowest rows of decoded blocks ...

    Is something like above used in practice?
    Could anybody summarize methods used in JPEG repackers (PACKjpg, lepton, brunsli)?
    My understanding is that PackJPG (and possibly Lepton) do what you propose. I don't read other people's source code in compression so I don't know for sure.

    In brunsli I do a different thing -- I extrapolate what the 1D DCT would be at the border of the two blocks. I'm secretly rather proud of this technique as it required a certain level of intuition in this domain. (Secretly, because usually no one wants to discuss this domain :-D). Intuitively this was a better approach for me, is faster to compute and we got better results with it when Zoltan compared the methods in practice (back in 2014 when we were developing that code).

    The math is super-cute when you extrapolate the 1D DCT from the 2D DCT. For other side you just some the coefficients (row or column-wise depending where you are extrapolating), for the other side you do an alternating sum.

    In brunsli we use these extrapolated 1D DCTs as context for some of the 2d DCT coefficients. Using a context is a balance between slowing down decoding and getting density -- so we don't use it for all coefficients.
    Last edited by Jyrki Alakuijala; 27th August 2020 at 12:13.

  7. Thanks:

    Jarek (27th August 2020)

  8. #5
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    803
    Thanks
    244
    Thanked 255 Times in 159 Posts
    Thanks, indeed data compression generally (beside e.g. patent minefields) suffers lacks of documentation - even if source is available, it is often faster to just work with data - like above results of a few hours of just testing basic approaches.

    Indeed going from 2D to 1D DCT is nice, however, as you usually have to decode the values, it seems cheaper to calculate 1D DCT from decoded values (... there is subtle difference in errors from quantization).
    Interpolating next bit row/column is a good idea, but doing it with linear combinations, it is already hidden in linear regressions I have used - I trust data-driven optimizations more than e.g. interpolations (containing hidden regularity assumptions).

    How do you choose width/uncertainty sigma for the distributions?

  9. #6
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    876
    Thanks
    242
    Thanked 325 Times in 198 Posts
    Quote Originally Posted by Jarek View Post
    How do you choose width/uncertainty sigma for the distributions?
    In brunsli the 1d dct extrapolation is only used for context modelling. The distribution is measured from the actual DCT values and not modeled.

    The actual values are clustered further based on the system that I introduced in WebP lossless and used later in Brotli, too. We used this system in Brunsli and JPEG XL to cluster entropy codes so that where the context modeling is not effective we don't need to pay the cost of having too many entropy codes there.

  10. #7
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    803
    Thanks
    244
    Thanked 255 Times in 159 Posts
    There are strong statistical dependencies between neighboring values: beside linear value dependencies, variance dependencies are also extremely strong and quite complicated e.g. for DCT - I am skeptical about their effective exploitation with clustering-like approaches ... log-likelihood improvement of a model is literally mean bits/value savings.

  11. #8
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    876
    Thanks
    242
    Thanked 325 Times in 198 Posts
    Quote Originally Posted by Jarek View Post
    There are strong statistical dependencies between neighboring values: beside linear value dependencies, variance dependencies are also extremely strong and quite complicated e.g. for DCT - I am skeptical about their effective exploitation with clustering-like approaches ... log-likelihood improvement of a model is literally mean bits/value savings.
    Right. Clustering is nice even when it doesn't fully exploit the correlations since it can be made relatively fast to decode.

  12. #9
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    803
    Thanks
    244
    Thanked 255 Times in 159 Posts
    Finally added some evaluation with quantization to https://arxiv.org/pdf/2007.12055 - the blue improvements on the right use 1D DCT of column toward left and row above (plots are coefficients: prediction mainly uses corresponding frequencies - we can focus on them) - prediction of widths has similar cost as prediction of values, but gives much larger improvements here.
    The red improvements use already decoded values in zigzag order - it is costly and gain is relatively small, but its practical approximations should have similar gains.


  13. #10
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    876
    Thanks
    242
    Thanked 325 Times in 198 Posts
    In PIK and JPEG XL we optimize the (zigzag) order to get best zero rle characteristics, too. There is also some additional benefit for keeping a lowering variance for other reasons in the chosen codings.

  14. #11
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    803
    Thanks
    244
    Thanked 255 Times in 159 Posts
    Sure order might improve a bit, but I am talking about exploiting dependencies inside block - the tests show that linear predictions from already decoded coefficients in this block can give a few percent improvement, especially for width prediction (#2 post here).

Similar Threads

  1. Replies: 0
    Last Post: 3rd May 2017, 19:22
  2. Trend-seeking adaptation of probability distribution?
    By Jarek in forum Data Compression
    Replies: 15
    Last Post: 12th February 2017, 16:00
  3. Replies: 75
    Last Post: 24th March 2014, 19:34
  4. Context quantization and CM asymmetry
    By Shelwien in forum Data Compression
    Replies: 2
    Last Post: 15th September 2010, 12:13
  5. Data Distribution Questions.
    By Tribune in forum Data Compression
    Replies: 13
    Last Post: 25th June 2008, 18:09

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •