Page 10 of 13 FirstFirst ... 89101112 ... LastLast
Results 271 to 300 of 361

Thread: EMMA - Context Mixing Compressor

  1. #271
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    936
    Thanks
    556
    Thanked 375 Times in 280 Posts
    Quote Originally Posted by Shelwien View Post
    That sounds right. Please check decoding though
    Code:
    book1       178340   177154  0.665%   1186
    book2       115514   114976  0.465%    538
    world95     343561   341172  0.695%   2389
    dickens    1910432  1898916  0.602%  11516
    enwik6      201419   200192  0.609%   1227
    enwik8drt 16855079 16753949* 0.600% 101130
    
    * (1-x/16855079)*100=0.6
    cmix seems to only have this atm:
      AddByteModel(new PPMD(16, 1680, manager_.bit_context_));

    and paq8pxd18:
      ppmd_12_256_1.Init( 12+(x.clevel>8),210<<(x.clevel>8))<<(x.clevel>13),1,0);
    ppmd_6_32_1.Init( 3<<(x.clevel>8),16<<(x.clevel>8),1,0);
    For enwik8.drt improvement is even better.
    LTCB score = 16855079 bytes goes from EMMA 1.6.x64 version but newer releases have a little bit worse performance for enwik8.drt.
    My latest score goes from EMMA 1.15.x64 and = 16862787 bytes, then ppm_mod improvement = 0.645%

  2. #272
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    521
    Thanks
    198
    Thanked 745 Times in 302 Posts
    Darek, that was only a simulated result by Shelwien, on the estimation of a 0.6% improvement.
    I've run a few more tests:

    Code:
    enwik8.drt
    
    16.811.229 bytes, EMMA x64 + mod_ppmd order 12, 128MB
    16.776.618 bytes, EMMA x64 + mod_ppmd order 12, 256MB
    16.734.970 bytes, EMMA x64 + mod_ppmd order 12, 512MB
    16.684.505 bytes, EMMA x64 + mod_ppmd order 12, 1024MB
    I'm currently testing enwik9.drt, but that's going to take a while (I had almost forgotten how boring it is to test with ludicrous mode active)..

    I also tested with orders 14,16 and 18, but 12 always gave the best results on this file, and I also tried using another mod_ppmd instance with order 6 concurrently with this one, but the gains on average were very poor, so I'll only be using a single instance. Currently I have the x86 version limited to 128MB and the x64 version limited to 1024MB, and selectable orders 12, 14, 16 and 18. Do you guys want some more options to test with?

  3. #273
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    936
    Thanks
    556
    Thanked 375 Times in 280 Posts
    Quote Originally Posted by mpais View Post
    Darek, that was only a simulated result by Shelwien, on the estimation of a 0.6% improvement.
    Sorry, I didn't noticed it....
    1024MB limit gives 1.057% of gain on ENWIK8. Quite well!

  4. #274
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,237
    Thanks
    192
    Thanked 968 Times in 501 Posts
    > I also tried using another mod_ppmd instance with order 6 concurrently with this one

    I guess you can also try o2 or o3 - and least paq8pxd18 has it.

    > Do you guys want some more options to test with?

    Can you check context?
    I mean, something like logging probability estimations generated by mod_ppmd and your main model,
    and then checking if maybe mod_ppmd only generates better results in some contexts, like eg. only for bit7 or bit6,
    or only after \x20 (space) symbols.

  5. #275
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    521
    Thanks
    198
    Thanked 745 Times in 302 Posts
    I tried other orders, it's just that order 6 was the best, on average, to combine with a separate instance running a higher order. But even then the gains were quite small. I'm testing some improvements, I'm down to 177.027 bytes for book1 on the x86 version.

    As for logging, it will be tricky to get the sort of data you want, but I'll see what I can do. The main model doesn't generate a single prediction (much like paq8 variants), and though I'm not familiar with mod_ppmd, I suspect the prediction it gives has already been "refined" by a SSE stage, while in EMMA the predictions from the main model get mixed with the ones from the other models and only then will SSE be applied. So I'll have to log all separate predictions from the main model, pre-SSE stage. And then there's ludicrous mode, which has information inheritance, so if an order 4 context is not found (possibly due to a hash collision), but an order 3 context exists, will use stats from the lower context to get a rough estimation of the probability for this higher context. Should this "pseudo" order 4 estimation be discarded?

  6. #276
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,237
    Thanks
    192
    Thanked 968 Times in 501 Posts
    > The main model doesn't generate a single prediction (much like paq8 variants),

    Why, it kinda has to, what else do you feed to arithmetic coder?

    Well, I guess it could be bytewise, but then won't it be better if I'd provide
    the byte probability distribution from ppmd? It already exists there anyway.

    In any case, I was actually talking about the mixing stage.
    If some prediction is improved by mixing it with ppmd's, it means that
    ppmd's prediction is better sometimes.
    So I'm suggesting to try and see where specifically it is better,
    because normally the mixer is unable to notice any patterns, when
    a specific model is only better in specific contexts.

    > though I'm not familiar with mod_ppmd, I suspect the prediction it gives has already been "refined" by a SSE stage,

    No, that's only used in ppmonstr. Ppmd mostly just directly uses the plain symbol freqs.
    In fact, it may be a good idea to try adding it there, because my SSE is considerably
    different from paq's.

    > while in EMMA the predictions from the main model get mixed with the ones from
    > the other models and only then will SSE be applied.

    Yes, and I only meant the mixer.
    I guess you can also try the same analysis for all submodels that you're mixing,
    but for my original idea you can just as well log the arithmetic coder's inputs without ppmd use, and with it,
    and see if there's a pattern in ppmd prediction improvements, ie flags like these:
    pbit_ppmd = bit ? (1-p0_ppmd) : p0_ppmd;
    pbit_old = bit ? (1-p0_old) : p0_old;
    flag = (pbit_ppmd > pbit_old);

    For example, ppmd uses a bytewise model, so its possible that it simply gives
    better (smaller) probability estimations for unused symbols (enwik8 doesn't use the whole
    range 0..255 of byte values). In this case ppmd's predictions for bit7 and maybe bit6
    could be better.

    > Should this "pseudo" order 4 estimation be discarded?

    No, I'd consider that also a part of your model.

  7. #277
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    521
    Thanks
    198
    Thanked 745 Times in 302 Posts
    pbit_ppmd = bit ? (1-p0_ppmd) : p0_ppmd;
    pbit_old = bit ? (1-p0_old) : p0_old;
    flag = (pbit_ppmd > pbit_old);


    Ok, so p0_old would be the prediction I would normally feed the arithmetic coder, post-mixing and SSE, and p0_ppmd would be the prediction from mod_ppmd alone, untouched?
    I'll run some tests and report the results then.

    >If some prediction is improved by mixing it with ppmd's, it means that
    ppmd's prediction is better sometimes.

    From my experience with EMMA, it isn't as linear as that. If both the main model and mod_ppmd predict a 1 with respectively 80% and 78% probabilities, the final prediction will most likely be 90%, ie, there is like a "confirmation bias", the final prediction is sometimes better than the best prediction from any model. This occurs because the sum of the mixer's weights isn't bounded. So in this simplistic example, pfinal may be = Clip( ( 1.21*0.8 + 1.14*0.78 )/2 ) = 0,93. So sometimes the prediction may be improved even though the prediction from the added model wasn't the best.

  8. #278
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,237
    Thanks
    192
    Thanked 968 Times in 501 Posts
    > Ok, so p0_old would be the prediction I would normally feed the arithmetic coder, post-mixing and SSE,
    > and p0_ppmd would be the prediction from mod_ppmd alone, untouched?

    No, p0_ppmd in this case would be the new model, with ppmd, also after mixing and SSE.

    > So sometimes the prediction may be improved even though the prediction from the added model wasn't the best.

    True, but it doesn't matter, we're trying to check whether ppmd just
    accidentally improves the results at random points, or if there's a visible
    contextual dependency among the points where predictions are improved.

    If its the latter, we can try further improving the results by adding ppmd
    to the mix only in specific contexts.

  9. #279
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    235
    Thanks
    102
    Thanked 140 Times in 102 Posts
    > Can you check context?
    > I mean, something like logging probability estimations generated by mod_ppmd and your main model,
    > and then checking if maybe mod_ppmd only generates better results in some contexts, like eg. only for bit7 or bit6,
    > or only after \x20 (space) symbols.

    > So I'm suggesting to try and see where specifically it is better,
    > because normally the mixer is unable to notice any patterns, when
    > a specific model is only better in specific contexts.

    Shouldn't be done more efficiently by the mixer instead of a fixed way?
    Ok, mixer must have the right context, but I guess that mod_ppmd can be also useful on non text/non usual text/mixed data (DNA, EXE, CSV, UTF-*...).
    If we know that mod_ppmd is better only for e.g. bit7 for english text, how can we be sure it's always better when the text model is switched on?
    Do you suggest to check all models and not just mod_ppmd?

    > [...] we're trying to check whether ppmd just
    > accidentally improves the results at random points, or if there's a visible
    > contextual dependency among the points where predictions are improved.

    > If its the latter, we can try further improving the results by adding ppmd
    > to the mix only in specific contexts.

    What do you suggest to do when a predictor is dynamically added to the mixer?
    I found that sometime is better to add an "else" prediction when a model doesn't have a prediction (e.g. when a match model don't match) and sometime is better feed the mixer with a flag to ignore the prediction (the mixer must check this flag every time it use a predictor, slowing down the program :-( ).

    TIA

  10. #280
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,237
    Thanks
    192
    Thanked 968 Times in 501 Posts
    > If we know that mod_ppmd is better only for e.g. bit7 for english text,
    > how can we be sure it's always better when the text model is switched on?

    My main idea about that was actually to use different mixer weights for ppmd
    depending on that context.

    > Do you suggest to check all models and not just mod_ppmd?

    Ideally, yes.

    Well, for my own coders I usually have a different option - I can simply
    collect all the relevant contexts, define masks for them, and then run
    my parameter optimizer.

    But in this case it would be likely faster to just check it visually first.

    > What do you suggest to do when a predictor is dynamically added to the mixer?

    I don't really see much problem in that... you can renormalize the remaining
    weights to the same total, for example.

    Also I don't like n-ary mixers anyway - I only use trees of binary mixers
    for my own coders, because there
    (1) meaning of mixer weight is very clear - its a probability of one input
    being better than another; its even possible to explicitly compute it
    (this probability), instead of using a specific mixer update;
    (2) its possible to use different update parameters for different binary mixers;
    (3) its possible to use arbitrary contexts for different mixers.

    And in such a mixer-tree model there's absolutely no problem with dynamically
    adding or removing some inputs - we'd simply skip mixing (and its update).

    P.S. I did add SSE to mod_ppmd predictions, book1 compression improved from 209933 to 204622.
    Will see how it affects paq8p or something.

  11. #281
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    521
    Thanks
    198
    Thanked 745 Times in 302 Posts
    I've made a special x64 build with logging capabilities. When logging is active, 2 predictions are calculated for each bit, one with the options selected, and another one with those same options except for the mod_ppmd model. These are then compared according to each context, and for each of those, the log registers how many times this context was found (the value in brackets), and then for each of the 8 bits encoded in that context (from msb to lsb), the percentages regarding how often did each model provide a better prediction. Note that these won't necessarily sum to 100%, since I'm excluding any situation where both predictions are equal. The first percentage listed is for the prediction excluding the mod_ppmd model, so a higher first percentage means mod_ppmd hurt the prediction more often then it helped, and vice-versa for the second percentage. The log file is created in the same directory as EMMA, named with the same name as the executable but with a ".log" extension (usually "emma.log"), so if you wish to run more than 1 instance, you should make several copies of both the executable and the preset file, and change their names.

    Now for some results (sorry for the long post).

    Code:
    book1
    
    Context: None [768771]
    [35,85%/40,58%] [37,31%/40,59%] [43,15%/47,17%] [44,47%/48,29%] [45,69%/47,49%] [34,83%/31,84%] [41,44%/44,20%] [26,93%/21,61%]
    
    Context: After space 0x20 [125551]
    [43,83%/48,61%] [47,20%/50,62%] [47,86%/49,92%] [47,85%/49,82%] [48,49%/48,88%] [47,29%/48,79%] [47,44%/48,51%] [27,38%/22,00%]
    
    Context: After comma+space [8557]
    [41,74%/41,22%] [49,05%/48,50%] [48,69%/48,92%] [51,61%/46,51%] [50,20%/46,46%] [49,23%/48,56%] [53,00%/44,92%] [28,08%/18,90%]
    
    Context: After uppercase [16330]
    [31,25%/42,23%] [32,16%/41,51%] [48,27%/47,94%] [48,15%/47,50%] [48,92%/45,60%] [50,22%/34,07%] [47,20%/38,14%] [30,24%/22,42%]
    
    Context: After line feed 0x10 [16621]
    [40,96%/46,89%] [46,68%/49,89%] [46,78%/50,62%] [46,73%/49,23%] [49,90%/47,16%] [44,20%/48,32%] [47,66%/50,51%] [26,60%/22,51%]
    
    Context: After digit+space [28]
    [64,29%/35,71%] [53,57%/39,29%] [53,57%/42,86%] [50,00%/50,00%] [67,86%/28,57%] [64,29%/35,71%] [35,71%/64,29%] [14,29%/42,86%]
    
    Context: After first digit [489]
    [60,33%/37,42%] [62,58%/31,29%] [60,94%/31,70%] [55,42%/40,08%] [40,29%/24,34%] [37,01%/30,88%] [36,40%/35,79%] [33,54%/21,88%]
    
    Context: After first 2 digits [430]
    [64,19%/34,88%] [67,44%/26,51%] [68,14%/28,14%] [49,77%/47,44%] [46,74%/22,79%] [33,49%/36,05%] [55,58%/24,42%] [35,35%/21,63%]
    Code:
    enwik6
    
    Context: None [1000000]
    [32,48%/38,22%] [34,22%/40,10%] [39,60%/45,42%] [41,24%/46,46%] [42,45%/46,99%] [34,88%/37,27%] [38,03%/44,89%] [30,61%/27,82%]
    
    Context: After space 0x20 [134168]
    [40,57%/44,79%] [43,66%/47,40%] [44,00%/47,30%] [44,86%/46,36%] [46,31%/48,44%] [46,42%/46,09%] [44,47%/47,95%] [32,43%/28,83%]
    
    Context: After comma+space [7263]
    [41,14%/42,53%] [46,39%/50,82%] [45,86%/49,40%] [47,42%/49,30%] [49,08%/48,13%] [54,14%/43,91%] [53,61%/43,91%] [36,93%/26,30%]
    
    Context: After uppercase [37462]
    [32,75%/43,50%] [35,85%/45,04%] [44,37%/47,69%] [45,17%/48,71%] [44,94%/50,52%] [44,73%/43,98%] [44,72%/43,56%] [34,05%/32,52%]
    
    Context: After line feed 0x10 [11639]
    [30,57%/31,61%] [29,98%/30,31%] [34,11%/31,50%] [36,63%/31,72%] [38,02%/33,65%] [38,62%/43,90%] [41,08%/44,12%] [27,97%/26,75%]
    
    Context: After digit+space [1084]
    [38,01%/55,44%] [42,71%/52,12%] [46,59%/50,74%] [55,17%/42,90%] [53,41%/45,39%] [60,06%/38,10%] [56,55%/41,70%] [45,20%/29,70%]
    
    Context: After first digit [7448]
    [43,82%/50,51%] [43,56%/46,39%] [43,38%/45,54%] [45,84%/49,01%] [41,61%/41,61%] [36,68%/39,61%] [40,36%/46,09%] [31,20%/36,75%]
    
    Context: After first 2 digits [6044]
    [42,72%/47,22%] [46,46%/42,19%] [43,65%/44,52%] [45,53%/46,77%] [43,68%/38,42%] [38,37%/39,73%] [41,30%/47,29%] [32,83%/33,45%]
    Code:
    dickens
    
    Context: None [10192446]
    [32,53%/35,57%] [33,06%/35,46%] [38,55%/41,96%] [40,84%/44,25%] [41,84%/45,35%] [28,66%/28,26%] [34,82%/36,45%] [24,64%/22,77%]
    
    Context: After space 0x20 [1739994]
    [40,71%/46,34%] [43,95%/49,26%] [44,85%/48,97%] [44,41%/49,14%] [45,41%/48,19%] [43,74%/48,73%] [45,51%/45,20%] [23,95%/22,16%]
    
    Context: After comma+space [156771]
    [39,88%/40,45%] [46,05%/47,69%] [46,12%/48,92%] [46,59%/48,10%] [46,66%/46,55%] [46,58%/49,27%] [46,00%/50,68%] [24,57%/21,04%]
    
    Context: After uppercase [234780]
    [28,69%/31,57%] [29,05%/30,81%] [43,67%/42,65%] [43,96%/42,98%] [42,78%/42,79%] [37,16%/36,43%] [35,16%/37,22%] [22,83%/24,38%]
    
    Context: After line feed 0x10 [200783]
    [38,40%/40,16%] [41,32%/42,12%] [41,82%/42,84%] [42,51%/43,43%] [41,35%/42,09%] [43,01%/48,24%] [44,12%/46,31%] [23,13%/21,93%]
    
    Context: After digit+space [159]
    [25,16%/27,67%] [25,16%/23,27%] [33,96%/22,64%] [30,82%/26,42%] [33,96%/27,04%] [38,99%/25,16%] [23,90%/35,85%] [10,69%/14,47%]
    
    Context: After first digit [1048]
    [34,92%/34,35%] [33,49%/29,10%] [37,60%/26,72%] [35,31%/28,15%] [36,35%/24,52%] [28,44%/33,02%] [35,40%/19,94%] [16,60%/17,56%]
    
    Context: After first 2 digits [809]
    [29,05%/32,14%] [31,64%/29,79%] [32,76%/32,14%] [41,04%/24,23%] [41,90%/25,46%] [44,99%/24,97%] [34,98%/24,60%] [22,62%/15,20%]
    If anyone wants to run some tests, I've attached this interim version.

  12. #282
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,237
    Thanks
    192
    Thanked 968 Times in 501 Posts
    Nice, but you kinda did too much, and I don't see anything in these stats.
    Can't you just write the log in the form like "1 22222 33333" ( ie { bit, p1, p2 } ).
    Then maybe I'd be able to generate something like this - http://nishi.dreamhosters.com/u/lzma_defl.png

  13. #283
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    521
    Thanks
    198
    Thanked 745 Times in 302 Posts
    >Nice, but you kinda did too much, and I don't see anything in these stats.

    Actually, I see a lot of interesting stats there. Look at book1:

    Code:
    Context: None [768771]
    [35,85%/40,58%] [37,31%/40,59%] [43,15%/47,17%] [44,47%/48,29%] [45,69%/47,49%] [34,83%/31,84%] [41,44%/44,20%] [26,93%/21,61%]
    Ok, so in general, mod_ppmd gives a somewhat consistent boost, regardless of context, to the quality of the predictions for bits 7 to 3, and 1, all of them about 3 to 5%. But, and this surprised me, the predictions get worse for bits 2 and 0. This means that the mixer is assigning a high weight to the mod_ppmd model in certain circunstances where its prediction (unexpectedly to any mixer context) is bad when compared to the other models. So the mixer is getting some benefit from the model, maybe because of that confirmation bias when generally the predictions are good and in agreement, but it's unable to derive the contexts in which it is failing. Since the compression ratio is improving anyway, it is safe to assume that the coding gains afforded by the improved predictions far outweight the coding losses, but if we could find the causes for these losses, we could use that information to improve the mixers contexts to be aware of these discrepancies, and then the mixer would by itself learn to trust less on mod_ppmd's predictions when needed.

    Now look at this context:
    Code:
    Context: After uppercase [16330]
    [31,25%/42,23%] [32,16%/41,51%] [48,27%/47,94%] [48,15%/47,50%] [48,92%/45,60%] [50,22%/34,07%] [47,20%/38,14%] [30,24%/22,42%]
    So, immediately after an uppercase letter is seen, mod_ppmd is quite good at predicting the first 2 bits, but more so, it's bad at predicting the last 3 bits. And in both cases, the results deviate significantly from the contextless average seen above.

    And let's see if spaces make a difference too:
    Code:
    Context: After space 0x20 [125551]
    [43,83%/48,61%] [47,20%/50,62%] [47,86%/49,92%] [47,85%/49,82%] [48,49%/48,88%] [47,29%/48,79%] [47,44%/48,51%] [27,38%/22,00%]
    If you look at the spreads between the results, they closely match those of the contextless average, and for the middle bits it's basically a tie. So a single space in and of itself isn't important.

    Let's check line feeds:
    Code:
    Context: After line feed 0x10 [16621]
    [40,96%/46,89%] [46,68%/49,89%] [46,78%/50,62%] [46,73%/49,23%] [49,90%/47,16%] [44,20%/48,32%] [47,66%/50,51%] [26,60%/22,51%]
    Here we see that mod_ppmd gives a nice small improvement to every bit except the last. So it seems that bit 0 really gets worse predictions with mod_ppmd in most contexts. And if you look at the contexts for digits, you'll see that in almost every bit position, mod_ppmd causes significantly worse predictions.

    And now, just for fun, here is the effect of the adaptive learning rate when using just the main model at its lowest setting coupled with the mod_ppmd model:

    Code:
    Adaptive Learning Rate Off
    Context: After uppercase [16330]
    [15,73%/42,10%] [17,89%/40,63%] [47,46%/49,37%] [49,42%/47,18%] [45,71%/48,09%] [44,86%/49,90%] [37,46%/53,03%] [3,14%/57,32%]
    
    Adaptive Learning Rate On
    Context: After uppercase [16330]
    [16,93%/41,21%] [21,53%/39,17%] [48,41%/48,74%] [50,49%/46,56%] [46,72%/47,76%] [48,71%/46,41%] [40,43%/50,31%] [3,13%/70,92%]
    With no text model, we see that mod_ppmd is much better at modelling after capital letters than the main model, especially for the last bit(!). And with the adaptive learning rate active, it squeezes even more gains from it. This makes some sense intuitively, since the last bit should be the easiest to predict, seeing as how we already have all previous bits for context, and so the prediction error should be small, in which case a reduced learning rate may help converge on a better local minimum.

    Just some food for thought.

    >Can't you just write the log in the form like "1 22222 33333" ( ie { bit, p1, p2 } ).

    Sure, if you prefer a more visual analysis. How about [Input Byte][P17..P10][P27..P20], with P1 and P2 quantized to 8 bits each? So for every input byte, you get 17 output bytes.

    Best regards

  14. #284
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,237
    Thanks
    192
    Thanked 968 Times in 501 Posts
    > Just some food for thought.

    I'd be happy if it helps you improve the model, but imho counts and differences
    are too small to make any conclusions there.

    > Sure, if you prefer a more visual analysis. How about [Input Byte][P17..P10][P27..P20],
    > with P1 and P2 quantized to 8 bits each? So for every input byte, you get 17 output bytes.

    My scripts currently use bit/probability pairs anyway, but I can convert it if its easier
    to write bytes for you. But please don't quantize the probabilities.

    ...

    So v4 is likely a failure (because its slower and uses more memory, but results are the same),
    but if you can make a console coder which would encode files from probability logs,
    I could try optimizing v4 SSE parameters based on overall results.
    The idea is to skip computing parts not affected by parameter values, like counter lookups,
    log their values once, and then only iterate the parametric part.

  15. #285
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    521
    Thanks
    198
    Thanked 745 Times in 302 Posts
    >Still, you can try testing it with EMMA - maybe with your mixing results would be better?
    Code:
    book1
    177.153     v3, unaltered
    177.129     v4, unaltered
    177.027     v3, SSE done by EMMA
    177.122     v4, SSE done by EMMA
    
    book2
    114.965     v3, unaltered
    114.937     v4, unaltered
    114.919     v3, SSE done by EMMA
    114.932     v4, SSE done by EMMA
    
    enwik6
    200.197     v3, unaltered
    200.124     v4, unaltered
    200.088     v3, SSE done by EMMA
    200.107     v4, SSE done by EMMA
    The unaltered results are sligthly better, but using SSE in EMMA gives even better results with v3.

    >I'd be happy if it helps you improve the model, but imho counts and differences
    >are too small to make any conclusions there.

    Well, it has already helped improve the prediction with SSE, and I'm currently trying to tweak some mixers contexts to see if I can squeeze some more gains.
    Code:
    enwik8.drt
    16.684.505    v3, unaltered
    16.679.568    v3, SSE done by EMMA
    
    enwik9.drt
    135.277.490   v3, unaltered
    135.254.675   v3, SSE done by EMMA
    >My scripts currently use bit/probability pairs anyway, but I can convert it if its easier
    >to write bytes for you. But please don't quantize the probabilities.

    It's just a matter of speed, it already takes so long to compress
    [Input Byte][P17,P27,..P10,P20] then? 33x expansion (16 bits for predictions, only 12 used)

    >I could try optimizing v4 SSE parameters based on overall results.

    So you'd precompute the unaltered mod_ppmd predictions for a file, your optimizer would run a SSE test scenario on it, output a new log file with just the predictions, and then EMMA would use this file when encoding, instead of calling mod_ppmd to get a prediction, it would simply read the prediction, and use it as usual (mix it with the rest, apply SSE, code it)? In that case this should ideally be a 2 pass version of EMMA, first pass would write a log file with the individual predictions for each model, so the second pass would just read these predictions and those from mod_ppmd, and just do the mixing, SSE and coding stages. But in EMMA, some of mixing contexts use information from the internal state of the models, so you can't skip the modelling. And then there's all the parsing, which is done online, nothing is stored in the compressed files, there is no block segmentation. So it would take quite a big rewrite to make it possible.
    And even then, I'm not sure that would be ideal. That wouldn't take into account that I can just do the SSE in EMMA based on the unaltered mod_ppmd prediction and maybe get better results.

  16. #286
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,237
    Thanks
    192
    Thanked 968 Times in 501 Posts
    For a change, tried to apply v4's SSE to paq8p log instead... 192108 -> 191437 for now (book1). Wonder if it would work with emma too.

  17. #287
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,237
    Thanks
    192
    Thanked 968 Times in 501 Posts
    191250 atm. So, can you post emma probability log for book1?

    Much less effect with paq8pxd18 though - 187415->187216.
    Any ideas about possible contexts? I tried adding SSE success runs etc, but it didn't help - just using order1 atm.

  18. #288
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    521
    Thanks
    198
    Thanked 745 Times in 302 Posts
    >So, can you post emma probability log for book1?

    https://goo.gl/pIuYpP

    >Much less effect with paq8pxd18 though - 187415->187216.
    >Any ideas about possible contexts? I tried adding SSE success runs etc, but it didn't help - just using order1 atm.

    I'm currently at 176.985 for book1, I don't think I'll be able to improve it further since I don't have access to mod_ppmd's internal state.
    I don't know how your SSE works for mod_ppmd (bytewise/bitwise?), for EMMA I'd try using:

    - current order, quantized, probably something like Min(?,Order) + log2(Max(1,Order-?))
    - most probable symbol in current context
    - number of symbols in current context (masked, so excluding symbols from escaped higher orders), quantized
    - difference in (masked) cumulative frequencies between the last context and the current one, quantized
    - (if bitwise only) already encoded bits from present symbol, with a leading 1 bit to disambiguate when encoding 0's
    - flags: have we escaped to a lower order? are we using inherited stats? does this context have non-ascii symbols (after masking)?

  19. #289
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,237
    Thanks
    192
    Thanked 968 Times in 501 Posts
    > https://goo.gl/pIuYpP

    Uh, can you tell what I'm doing wrong? http://pastebin.com/3XCTXEf7
    I'm getting clen=2914871.815 bytes from this script (run it as "coder book1.log book1.pd").
    Symbols are correct, but probabilities are somehow wrong?
    Can you maybe give me final probabilities that go to rangecoder?

    As to order,mpc,nummasked - I guess its an idea, as I can provide these easily enough, same for the whole byte though.
    However some of the others don't really make sense for ppmd.

    P.S. Current SSE improvement for pxd18 - 16,516,347 -> 16,502,512 (book1 opt) -> 16,492,547 (enwik8 opt)

  20. #290
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    521
    Thanks
    198
    Thanked 745 Times in 302 Posts
    On a quick check, your struct is wrong. The probabilities are interleaved [P1,P2,..P1,P2]. On enwik8.drt I'm at 16.675.959, so >1% gain from mod_ppmd. I guess I'll also have to improve the text model, my models weren't written with maximum compression in mind. On an unrelated note, I've run some quick tests by plugging parts of the new grayscale model into the 24bpp image model and I'm getting up to 1% gains, so it seems worth it to write a new 24bpp model. Now I just need to find the time to do it..

  21. #291
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,237
    Thanks
    192
    Thanked 968 Times in 501 Posts
    > On a quick check, your struct is wrong.

    Well, actually yours was wrong, because it wasn't what you described in https://encode.su/threads/?p=51533&pp=1 :)
    But thanks to the new info it worked.

    > The probabilities are interleaved [P1,P2,..P1,P2].

    My SSE improves it from 176,966 to 176,738 atm.
    I can post the converter and SSE coder if you want (with C++ source).

    > I guess I'll also have to improve the text model, my models weren't written
    > with maximum compression in mind.

    Note that pxd18 result that I mentioned is the one with dynamic dictionary.
    When external drt is used (with its dictionary), it becomes 16387097 or so.

    > On an unrelated note, I've run some quick tests by plugging parts of the
    > new grayscale model into the 24bpp image model and I'm getting up to 1%
    > gains, so it seems worth it to write a new 24bpp model. Now I just need to
    > find the time to do it..

    1. Did you try mod_ppmd with non-text files?

    2. Are you planning to use color conversions for 24bpp model?
    It could be neat to implement a lossless version of HSV for that.

    3. There're actually multiple image types, independently from color model etc.
    Even photos converted from raws are different from photos converted from jpeg.
    Also, grayscale model is basically the same model that can be used for intensity
    channel for 24bpp images (with color conversion).
    So I think that you'd gain more by making eg. different models for photos
    and digital pictures (3D renders etc) than by trying to make universal models
    per color depth.

  22. #292
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    521
    Thanks
    198
    Thanked 745 Times in 302 Posts
    >Well, actually yours was wrong, because it wasn't what you described in https://encode.su/threads/?p=51533&pp=1

    See post #286 -> [Input Byte][P17,P27,..P10,P20] then? 33x expansion (16 bits for predictions, only 12 used)

    >I can post the converter and SSE coder if you want (with C++ source).

    Sure, thanks

    >1. Did you try mod_ppmd with non-text files?

    Yes, but on images EMMA shuts down all other models except for the match model. On most non-textual files the gain is smaller but still interesting.

    >2. Are you planning to use color conversions for 24bpp model?

    EMMA already has an optional color transformation, I've tried several actually, might make it user selectable.

    >So I think that you'd gain more by making eg. different models for photos and digital pictures (3D renders etc) than by trying to make universal models per color depth.

    EMMA already does that. On any 8bpp image that it recognizes, it parses the palette to see if it represents a grayscale image. So for 8bpp I have a model for grayscale and another for color-palette images. Even then, my grayscale model was designed to handle both natural and artifical images (see the results listed for the imagecompression.info testset). The same goes for the 24bpp model, and by extension, the 32bpp, which can handle the 4th channel as pixel data or transparency. I also have models for 4bpp and 1bpp, which are quite useful for executables, since they handle many resources. I then have a generic image model, which handles 1 to 16bpp images, in 1 to 4 channels, bitpacked or not, with control bytes or not, and with a variable pixel layout, to accomodate different CFA layouts from digital cameras. This model handles a lot of raw formats, TIFFs, DICOM images, etc. I then have specific models for some popular raw formats, such as ARW from Sony and RW2 from Panasonic. And obviously there are the JPEG and GIF models, though the latter gives sub-par results since it doesn't decompress the LZW encoded pixel data, it works by predicting the LZW indexes, that way it doesn't fail for some images like the transform in paq8, but the achievable ratios suffer a lot. That is why I never made a PNG model, it would be really complex and give mediocre results when compared to something like your Reflate.

  23. #293
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,237
    Thanks
    192
    Thanked 968 Times in 501 Posts
    > See post #286

    Ok, so I missed it :)

    >I can post the converter and SSE coder if you want (with C++ source).

    http://nishi.dreamhosters.com/u/SSE_v0.rar
    Mainly sh_SSE1.inc and MOD/* (these are generated though)

    > This model handles a lot of raw formats, TIFFs, DICOM images, etc.

    That's pretty cool, quite a long list.

    Do you have a match model for images, something like "motion compensation" in videos?

    > And obviously there are the JPEG and GIF models,

    Did you see that btw - http://nishi.dreamhosters.com/u/001.avi
    Its a10.jpg from SFC, converted FFT coef per frame video.

    > That is why I never made a PNG model, it would be really complex
    > and give mediocre results when compared to something like your Reflate.

    Reflate is actually pretty bad at handling pngs atm.
    That is, it generates too much diffs because of 4k winsize, etc.
    The plan is to make a specialized png recompressor based on reflate.
    Ideally, it would output a bmp image + diff data, but that still
    requires writing a recompressor for png's delta modes.

  24. #294
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    521
    Thanks
    198
    Thanked 745 Times in 302 Posts
    >Do you have a match model for images, something like "motion compensation" in videos?

    I planned on trying it with the 8bpp color-palette model, on the idea that it may help with dithering patterns. Something like a 2D-match model, or as you describe it, apply motion-compensation pattern matching techniques but use them on previously seen pixel-data instead of previous frames. I also have code, currently on hold for that model, to look for symmetries, on the idea that on non-photographic color-palette images (such as icons, cartoons, etc) there is often a lot of symmetry that a simple predictive model based on pixel neighborhoods may miss, but a mirrored match along the symmetry axis can provide important data.

    >The plan is to make a specialized png recompressor based on reflate.

    I had a pretty good sketch of how I'd do the PNG model, combining ideas from the JPEG and GIF model. But since some types of data will require pre-processing (EMMA is purely a streaming compressor, with the lowest latency possible) so sooner or later I'd have to do it in EMMA, and there is already a lot of research on that, I decided it wasn't worth the effort. If sucessful, that would allow me to use my existing models, which provide much better compression. The LZW encoding is simpler and yet it proved quite hard to predict the codewords, and I made the erroneous assumption that GIF encoders respected the LZW algorithm, which led to problems with the model rejecting some GIFs when it thought they were corrupted. So I dread the amount of special corner cases that I'd have to account for with PNG..

  25. #295
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,237
    Thanks
    192
    Thanked 968 Times in 501 Posts
    I was given a weird jpeg file, that even lepton compresses better than paq8px.
    Code:
    http://nishi.dreamhosters.com/u/fail1.jpg
    
    2,307,541 1.jpg
    2,307,428 1.7z
    1,826,864 pjpg
    1,821,830 rejpeg
    1,738,654 winzip
    1,727,670 paq8px75 -5
    1,726,908 jojpg -5
    1,726,774 jojpg -6
    1,725,816 paq8pxd18 -s7
    1,721,643 lepton
    1,707,722 packjpg
    1,675,719 emma 1.20 (image slow)
    1,675,651 emma 1.20 (best)
    1,675,032 stuffit14
    1,673,355 stuffit14 (thumbnail disabled)
    Do you use a translated jpegModel from paq8px in emma?
    If so, do you remember what kind of bugs did you fix there? :)

  26. The Following User Says Thank You to Shelwien For This Useful Post:

    xinix (5th February 2017)

  27. #296
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    521
    Thanks
    198
    Thanked 745 Times in 302 Posts
    >I was given a weird jpeg file, that even lepton compresses better than paq8px.

    There's nothing "weird" about it, I have several files which compress better with Lepton than paq8px. I'm guessing Lepton and StuffIt don't make the same mistake (see point 2 below).

    >Do you use a translated jpegModel from paq8px in emma?
    >If so, do you remember what kind of bugs did you fix there?

    The JPEG model in EMMA is based on the one in paq8px, I studied it before creating my own, and found several shortcomings:

    1) I completely decoupled the parser from the modelling stage.

    Originally, my parser was meant to handle an unspecified level of recursively embedded thumbnails (so, a thumbnail within a thumbnail within..).
    In practice, I never found any file with more than one level of embedded thumbnails, so I simplified it.
    The parser reads the necessary segments and handles building the quantization and huffman tables, which are then passed to the model.
    If the huffman tables are missing, it loads the default huffman tables (cf. JPEG standard section K.3), this allows EMMA to compress MJPEG video frames.

    In paq8 variants, the initial parser is very basic and serves only to do block segmentation. So the JPEG model is called for the whole block,
    and it is there that the segments are read. Unfortunately, it doesn't handle embedded thumbnails nor missing segments.

    2) I handle sub-sampling patterns with respect to determining neighboring blocks.

    When predicting the quantized DCT coefficients, the previously decoded horizontally and vertically adjacent blocks of this color component are used.
    To do so correctly, you have to take into account the sub-sampling factors. paq8px fails here. With 4:1:1 sub-sampled images such as the one you posted,
    this is crucial.

    3) I predict RSTx markers and dummy bytes

    4) I have separate modelling for MJPEG frames, to try to exploit inter-frame correlation.

    That's about it, if one fixes these paq8px should at least get comparable results to EMMA.
    My models (apart from the new grayscale 8bpp model) weren't designed with maximum compression in mind, most are actually lower in complexity than the ones in paq8px.
    Instead, I focused on making efficient use of all the data available to improve the predictions.

  28. The Following 3 Users Say Thank You to mpais For This Useful Post:

    Mike (5th February 2017),Shelwien (4th February 2017),xinix (5th February 2017)

  29. #297
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,237
    Thanks
    192
    Thanked 968 Times in 501 Posts
    > it doesn't handle embedded thumbnails nor missing segments

    Did you try using embedded thumbnail as context for the main picture?

    > you have to take into account the sub-sampling factors. paq8px fails here.

    Thanks, I suspected this, but also considered other possibilities, like
    high resolution / low quality (so lots of RLE) and weirdly optimized huffman tables.

    > I'm guessing Lepton and StuffIt don't make the same mistake

    Afaik they compress components separately, basically treat them like a bunch
    of bitplane pictures. Just that stuffit has multiple models for different
    types of pictures and can optimize their choice for subblocks.

    That's why its kinda interesting how much more potential there is for
    jpeg recompression.

  30. The Following User Says Thank You to Shelwien For This Useful Post:

    xinix (5th February 2017)

  31. #298
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    521
    Thanks
    198
    Thanked 745 Times in 302 Posts
    >Did you try using embedded thumbnail as context for the main picture?

    That was one of my ideas to improve the prediction of DC coefficients, but it would involve doing the iDCT (so a significant slowdown), and most thumbnails are very scaled down, so each decoded pixel in the thumbnail might represent the average of more than 8x8 pixels in the main image. Using the iDCT I would also like to try using the luma component to help predict the chroma components when they are sub-sampled.

    >That's why its kinda interesting how much more potential there is for
    >jpeg recompression.

    For high quality baseline jpeg images I don't think you can expect much, since most of the low-hanging fruit as already been picked. However, on low quality images, where the distribution is more heavily skewed and therefore Huffman code gives poor results, there may still be some reasonable gains (better modelling of the zero-runs (especially chroma from luma), pre-training some simple quantized contexts (since on low quality images, the DC coefficient holds most of the energy, and most AC coefficients are 0), recompression of the huffman and quantization tables themselves, which on small images account for a much larger percentage of the filesize). Have you tried any of these?

  32. The Following User Says Thank You to mpais For This Useful Post:

    xinix (5th February 2017)

  33. #299
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,237
    Thanks
    192
    Thanked 968 Times in 501 Posts
    > Have you tried any of these?

    No, I only made a fast model for pjpg, with a single counter
    per bit of unary code of coefs (well, its not full unary, more
    like elias gamma).

    Also, I have a different method of recompression for jpegs,
    the steganography-based one.
    Afaik, all pixel transforms in jpeg are linear, so it should be
    possible to convert quantized spectral coefs to color component ranges.
    Then we can fill these ranges with useful data via steganography,
    and compress the bitmap with a lossless image model.
    Of course, the compressed bitmap size would be likely even larger
    than original jpeg, but we'd be able to subtract the size of
    steganography payload from that.

  34. The Following User Says Thank You to Shelwien For This Useful Post:

    xinix (5th February 2017)

  35. #300
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    521
    Thanks
    198
    Thanked 745 Times in 302 Posts
    EMMA v0.1.21, available at https://goo.gl/EDd1sS

    Code:
    Changes:
    - mod_ppmd model by Eugene Shelwien
    - Improved the 8-bit grayscale image model
    - Command-line parsing
    - Some small tweaks
    You can now use the command line to (de)compress files:
    "C i input_file output_file", i is the index of the preset to use when compressing (0 based), and "D input_file output_directory" for decompression.

    I've finished improving the mixing and refinement stages for the grayscale 8bpp model, here are the results (same files as before):

    Attachment 4819

    Attachment 4820

    mod_ppmd also gives some nice improvements:

    Code:
    book1
    178.285 EMMA 0.1.21 x86, no mod_ppmd
    176.911 EMMA 0.1.21 x86, with mod_ppmd
    
    book2
    115.560 EMMA 0.1.21 x86, no mod_ppmd
    114.865 EMMA 0.1.21 x86, with mod_ppmd
    
    enwik6
    201.708 EMMA 0.1.21 x86, no mod_ppmd
    200.228 EMMA 0.1.21 x86, with mod_ppmd
    
    dickens
    1.909.172 EMMA 0.1.21 x86, no mod_ppmd
    1.897.327 EMMA 0.1.21 x86, with mod_ppmd
    
    acrord32.exe
    866.046 EMMA 0.1.21 x86, no mod_ppmd
    862.625 EMMA 0.1.21 x86, with mod_ppmd
    
    M.dbf
    48.419 EMMA 0.1.21 x86, no mod_ppmd
    47.983 EMMA 0.1.21 x86, with mod_ppmd
    
    Q.wk3
    193.198 EMMA 0.1.21 x86, no mod_ppmd
    191.173 EMMA 0.1.21 x86, with mod_ppmd
    Could a moderator be so kind as to update the first post with the new link? Thanks in advance

  36. The Following 4 Users Say Thank You to mpais For This Useful Post:

    Darek (5th February 2017),Hacker (6th February 2017),Mike (5th February 2017),Stephan Busch (6th February 2017)

Page 10 of 13 FirstFirst ... 89101112 ... LastLast

Similar Threads

  1. Context mixing file compressor for MenuetOS (x86-64 asm)
    By x3k30c in forum Data Compression
    Replies: 0
    Last Post: 12th December 2015, 06:19
  2. Context Mixing
    By Cyan in forum Data Compression
    Replies: 9
    Last Post: 23rd December 2010, 20:45
  3. Simple bytewise context mixing demo
    By Shelwien in forum Data Compression
    Replies: 11
    Last Post: 27th January 2010, 03:12
  4. Context mixing
    By Cyan in forum Data Compression
    Replies: 7
    Last Post: 4th December 2009, 18:12
  5. CMM fast context mixing compressor
    By toffer in forum Forum Archive
    Replies: 171
    Last Post: 24th April 2008, 13:57

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •