Page 45 of 78 FirstFirst ... 35434445464755 ... LastLast
Results 1,321 to 1,350 of 2326

Thread: paq8px

  1. #1321
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    289
    Thanks
    10
    Thanked 34 Times in 22 Posts
    Quote Originally Posted by Gotty View Post
    Thanx Sebastian - that seems to be a bug report.
    Unfortunately I don't have working knowledge in that area.
    Do you happen to know where the problem is (in the code), or you just experienced that some audio content is not detected properly?
    None of the testfiles from rarewares.org get detected properly

    The bug is between line 9455 and 9498, the file detection code is a mess

    I think it is this line

    Code:
    if (p==16+wavlen && (buf1!=0x666d7420 || bswap(buf0)!=16)) wavlen=((bswap(buf0)+1)&(-2))+8, wavi*=(buf1==0x666d7420 && bswap(buf0)!=16);
    The fmt-chunk size is 18 for wave-extensible files, so adding "bswap(buf0)!=18" should fix it

  2. Thanks:

    Gotty (17th August 2018)

  3. #1322
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    733
    Thanks
    424
    Thanked 480 Times in 255 Posts
    Quote Originally Posted by Sebastian View Post
    the file detection code is a mess
    Couldn't agree more.
    It's on my long-term to-do list to fix the readibility and maintainability there. Unfortunately I do not have the proper knowledge (yet) to do so.
    Thanx for the fix-suggestion. I'll try that when I get back to my pc.

  4. Thanks:

    Mike (17th August 2018)

  5. #1323
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,272
    Thanks
    802
    Thanked 545 Times in 415 Posts
    >However it's not perfect: not every file is happy, and himself too was even better in an earlier version (v152 was: 7 885 780). I'll need to find an equilibrium. Keep fingers crossed.
    That's normal - some files could be "sacrificed" if the total compression or most of files will improve...
    According to mozilla file - paq8pxd best score (for v48 ) is 7 384 310 - 500KB less due to improved parsers (f.e. DEC). Maybe this is a way to improve mozilla compression even more - to find "unserviced" types of standards/files.

    By the way - I've add best score highlights to the table and minimum size column with summary of best scores = theoretical paq8px minimum .


    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	paq8px_v155_4CorpusesUpd.jpg 
Views:	66 
Size:	2.06 MB 
ID:	6101  

  6. Thanks:

    Gotty (19th August 2018)

  7. #1324
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    577
    Thanks
    220
    Thanked 833 Times in 341 Posts
    Quote Originally Posted by Gotty View Post
    Code:
    ...
      - on mismatch, delta mode didn't switch immediately: fixed
    ...
    Nice catch, in EMMA the models first predict() and then update(), and when merging everything I was focused on getting fast mode to work and forgot about that detail.

    Quote Originally Posted by Gotty View Post
    Code:
    ...
      - the default context was c0, now it is 0 (indicating "no match" or "mismatch")
      - skipping mixing (add(0)) when state is empty (unknown context).
    ...
    Did you test these changes in isolation, i.e., without the other small changes, to see their effect on compression?
    The reason I left the default context at c0 was because then the mixer would probably still keep the weight for those inputs as somewhat useful, since they might only be in use when we have a match or are in delta mode.
    The improvements reported from Darek are really small, and might just be because of the fixed delta mode with the new context or the move to using the normalModel for special data types too.

    As for mozilla, the degradation from v153 onwards is most likely because of the additional block segmentation information because of the new text detection.

    Quote Originally Posted by Sebastian View Post
    None of the testfiles from rarewares.org get detected properly

    The bug is between line 9455 and 9498, the file detection code is a mess

    I think it is this line

    Code:
    if (p==16+wavlen && (buf1!=0x666d7420 || bswap(buf0)!=16)) wavlen=((bswap(buf0)+1)&(-2))+8, wavi*=(buf1==0x666d7420 && bswap(buf0)!=16);
    The fmt-chunk size is 18 for wave-extensible files, so adding "bswap(buf0)!=18" should fix it
    That won't do it. Here, this should fix it:

    if (p==16+wavlen && (buf1!=0x666d7420 || ((wavm=bswap(buf0)-16)&0xFFFFFFFD)!=0)) wavlen=((bswap(buf0)+1)&(-2))+8, wavi*=(buf1==0x666d7420 && (wavm&0xFFFFFFFD)!=0);

    I'll include this fix in the next version.

    Quote Originally Posted by Gotty View Post
    Couldn't agree more.
    It's on my long-term to-do list to fix the readibility and maintainability there. Unfortunately I do not have the proper knowledge (yet) to do so.
    Thanx for the fix-suggestion. I'll try that when I get back to my pc.
    The code being a mess and limiting was one of the reasons I started Fairytale, to build something better. As-is, I'd have to make things a lot messier just to port the parsers from EMMA.

  8. #1325
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    289
    Thanks
    10
    Thanked 34 Times in 22 Posts
    what about adding an ordinary least squares predictor to the image model?
    the code is already in WAV_MODEL but i could provide a cleaner version. You just use some neighborhood pixels and adaptively optimize the coefficients.
    Sure using some neighborhood as context already is a variant of this but either it simplifies the model or improves compression instead of using some fixed weight predictors.

  9. #1326
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    733
    Thanks
    424
    Thanked 480 Times in 255 Posts
    Quote Originally Posted by mpais View Post
    Did you test these changes in isolation, i.e., without the other small changes, to see their effect on compression?
    The reason I left the default context at c0 was because then the mixer would probably still keep the weight for those inputs as somewhat useful, since they might only be in use when we have a match or are in delta mode.
    Yes. All changes in code brought improvements here and there. I had the idea of skipping prediction when the matchModel has no clue, so the weights are kept intact in the Mixer until the matchModel kicks in again. And it worked.

    Quote Originally Posted by mpais View Post
    As for mozilla, the degradation from v153 onwards is most likely because of the additional block segmentation information because of the new text detection
    Luckily I already know that the mozilla-degradation in v156 is caused by the "add(0)"s in ContextMap. Most if not all small files had benefit from this change. The large files in Silesia didn't. Now, I (think I) know why. state==0 is not useful for many files: the mixer sees predictions from state==0 just garbage: new contexts are usually different from each other, but there are some files when new contexts (state==0) have some similarities and it is useful for them to keep the predictions mixing for state==0. It looks like mozilla is one of them.
    Unfortunately I tested with the smaller corpuses only, and so could not see what happens to Silesia meanwhile. Silesia is a bit big and takes a night for me to run one round.

    Quote Originally Posted by mpais View Post
    I'll include this fix in the next version.
    Thanx for that.
    In my next version I'm bringing more improvements to the matchModel, and fix the add(0) problem mentioned above. Smaller corpuses are tested and are happy, Silesia will be on the run after a couple of hours. Then I'll post. Did you begin a new version yourself?

    Edit: Ah, forgot one thing: thank you so much for the excellent matchModel! The best models are those that are flexible and easy to improve. And this is such a model.
    Last edited by Gotty; 18th August 2018 at 18:56.

  10. #1327
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    577
    Thanks
    220
    Thanked 833 Times in 341 Posts
    Quote Originally Posted by Sebastian View Post
    what about adding an ordinary least squares predictor to the image model?
    the code is already in WAV_MODEL but i could provide a cleaner version. You just use some neighborhood pixels and adaptively optimize the coefficients.
    Sure using some neighborhood as context already is a variant of this but either it simplifies the model or improves compression instead of using some fixed weight predictors.
    Sure, go for it, but I doubt you'll see significant compression improvements from it.
    And right now I'd say the weakest point of the model is with artificial images, like graphics and/or text, where the sharp discontinuities make most linear predictors useless and we need to rely on the texture descriptors of the contextMap.

    Quote Originally Posted by Gotty View Post
    Yes. All changes in code brought improvements here and there. I had the idea of skipping prediction when the matchModel has no clue, so the weights are kept intact in the Mixer until the matchModel kicks in again. And it worked.
    Cool, then I don't need to run any tests. And nice to see that we had similar ideas and yet tried different approaches.

    Quote Originally Posted by Gotty View Post
    In my next version I'm bringing more improvements to the matchModel, and fix the add(0) problem mentioned above. Smaller corpuses are tested and are happy, Silesia will be on the run after a couple of hours. Then I'll post. Did you begin a new version yourself?
    My internal version is usually always 3 or 4 versions ahead of the code I publish, I just remove what I still haven't finished testing/improving and publish what I've got.
    For the next version I'll probably keep improving the SSE stage, since you want to work on the mixer. No point in improving the mixing stage if the SSE stage is weak.

    Quote Originally Posted by Gotty View Post
    Edit: Ah, forgot one thing: thank you so much for the excellent matchModel! The best models are those that are flexible and easy to improve. And this is such a model.
    Well, it was just a quick hack, I honestly didn't expect much improvement from it.
    On my (way, way too long) to-do list is a 2D fuzzy match model for the image models, to see if it can help on what I described above.
    And right now the 8bpp image model already looks for horizontal symmetries, i.e., it looks for mirrored features, which we could also use on the 24/32bpp image model.

  11. #1328
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    289
    Thanks
    10
    Thanked 34 Times in 22 Posts
    Quote Originally Posted by mpais View Post
    Sure, go for it
    I quickly plugged together something.

    Parameters are:
    n=number of predictors
    kmax=update interval (default at 1, larger is faster)
    lambda=decay rate of the covariance estimation (default=0.998 )
    nu=Tikhonov regularization parameter (default=0.001)

    use example

    OLS<double>ols(2,1,0);
    ols.Predict({1,2});
    ols.Update(5)
    std::cout << ols.Predict({1,2}) << '\n';


    The causal neighborhood should be large (not just 5 pixels)

    this algorithm reflects the method from the following paper
    https://ieeexplore.ieee.org/document/727397/

    Improvements could be made in that pixels farther away should be weighted less (e.g. manhatten distance) - edges are also a problem


    // general linear model of the form p=w1*p1+w2*p2+...+wn*pn
    template <class T>
    class OLS {
    typedef std::vector<T> vec1D;
    typedef std::vector<vec1D> vec2D;
    const T ftol=1E-8;
    public:
    OLS(int n,int kmax=1,T lambda=0.998,T nu=0.001)
    :n(n),kmax(kmax),lambda(lambda),nu(nu),
    x(n),w(n),b(n),mcov(n,vec1D(n)),mchol(n,vec1D(n))
    {
    km=0;
    }
    T Predict(const vec1D &p) {
    x=p;
    T sum=0.;
    for (int i=0;i<n;i++) sum+=w[i]*x[i];
    return sum;
    }
    void Update(T val)
    {
    for (int j=0;j<n;j++)
    for (int i=0;i<n;i++) mcov[j][i]=lambda*mcov[j][i]+(1.0-lambda)*(x[j]*x[i]);

    for (int i=0;i<n;i++) b[i]=lambda*b[i]+(1.0-lambda)*(x[i]*val);

    km++;
    if (km>=kmax) {
    if (!Factor(mcov)) Solve(b,w);
    km=0;
    }
    }
    private:
    int Factor(const vec2D &mcov)
    {
    mchol=mcov; // copy the matrix
    for (int i=0;i<n;i++) mchol[i][i]+=nu;
    for (int i=0;i<n;i++) {
    for (int j=0;j<i;j++) {
    T sum=mchol[i][j];
    for (int k=0;k<j;k++) sum-=(mchol[i][k]*mchol[j][k]);
    mchol[i][j]=sum/mchol[j][j];
    }
    T sum=mchol[i][i];
    for (int k=0;k<i;k++) sum-=(mchol[i][k]*mchol[i][k]);
    if (sum>ftol) mchol[i][i]=sqrt(sum);
    else return 1; // matrix indefinit
    }
    return 0;
    }

    void Solve(const vec1D &b,vec1D &sol)
    {
    for (int i=0;i<n;i++) {
    T sum=b[i];
    for (int j=0;j<i;j++) sum-=(mchol[i][j]*sol[j]);
    sol[i]=sum/mcov[i][i];
    }
    for (int i=n-1;i>=0;i--) {
    T sum=sol[i];
    for (int j=i+1;j<n;j++) sum-=(mchol[j][i]*sol[j]);
    sol[i]=sum/mcov[i][i];
    }
    }
    int n,kmax,km;
    T lambda,nu;
    vec1D x,w,b;
    vec2D mcov,mchol;
    };
    Last edited by Sebastian; 18th August 2018 at 23:02.

  12. Thanks:

    mpais (19th August 2018)

  13. #1329
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    733
    Thanks
    424
    Thanked 480 Times in 255 Posts

    Pull request: paq8px_v157

    Code:
    - New hash functions (currently used in matchModel only)
    - Compression enhancement: "skipping mixing (add(0)) when state is empty (unknown context)" from v156 is now "skipping some mixing when state is empty (unknown context)"
    - Compression and speed enhancement: eliminated a useless prediction from ContextMap and ContextMap2
    - Tiny compression enhancement: rounding in ContextMap and ContextMap2 (1-2-byte gain only)
    - "c8" in exeModel is now global - it may be referenced from other models. Unfortunately using it in matchModel would yield only a very tiny speedup for the cost of more code. Therefore, this matchModel change is not committed.
    - Enhancements in matchModel:
      - using a bit more diffuse hash calculation for less collision 
      - now 3 sequences are monitored (lengths: 9, 7, 5)
      - matching is performed all the way (not just for a minimal length of 2, but for 9->7->5)
      - matching expansion didn't work out; the code is provided (commented out), it may just need some tweaking
    - Miscellaneous cleanup in code/comments
    Wavmodel / Wav detection is untouched.
    Mozilla feels better and is sending his regards.
    Attached Files Attached Files

  14. Thanks (2):

    Darek (19th August 2018),mpais (19th August 2018)

  15. #1330
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    577
    Thanks
    220
    Thanked 833 Times in 341 Posts

    paq8px_v158

    Code:
    Changes:
    - Improved SSE stage for 8bpp grayscale non-PNG images
    - Improved WAV detection
    - Simplified MatchModel to allow for easier tweaking, moved generic contexts to normalModel
    Results:
    - Imagecompression.info 8bpp grayscale image testset
    Click image for larger version. 

Name:	imagecompression.info.png 
Views:	92 
Size:	28.8 KB 
ID:	6104

    - SqueezeChart 8bpp grayscale image testset
    Click image for larger version. 

Name:	squeezechart.png 
Views:	100 
Size:	12.3 KB 
ID:	6105

    Notes:
    I consolidated most of the MatchModel changes from v157 to make it easier to tweak, and made a few small changes. I configured the parameters to closely match v157, so don't expect many changes.

    @Sebastian
    To test on the files from rarewaves.org you'll first have to unpack them with WavPack, otherwise the audio model will just be compressing already compressed data.
    Next time I port changes from paq8 to cmix I'll see about fixing the detection there too, so you can test the improvement that the LSTM network gives over just the model.

  16. Thanks (3):

    Darek (19th August 2018),Gotty (19th August 2018),Sebastian (19th August 2018)

  17. #1331
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,272
    Thanks
    802
    Thanked 545 Times in 415 Posts
    You are too fast... Here are paq8px_v157 scores for my testset scores, 4 corpuses in progress.
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	paq8px_v157.jpg 
Views:	55 
Size:	618.3 KB 
ID:	6107  

  18. Thanks:

    Gotty (19th August 2018)

  19. #1332
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    733
    Thanks
    424
    Thanked 480 Times in 255 Posts
    Thanx, Márcio for fixing that shift. Strange, that it was still good enough or good at all. Now it must be even better.

  20. #1333
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    733
    Thanks
    424
    Thanked 480 Times in 255 Posts
    Quote Originally Posted by Darek View Post
    >By the way - I've add best score highlights to the table and minimum size column with summary of best scores = theoretical paq8px minimum .
    That was useful, indeed. Interesting to see that some older versions are still "strong" regarding some files.

  21. #1334
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    289
    Thanks
    10
    Thanked 34 Times in 22 Posts
    Quote Originally Posted by mpais View Post
    Code:
    - Improved WAV detection
    it works now, thanks

    a quick test

    Code:
    atrain.wav (+7%), using the same predictor as paq, sac gets something like 1.565.xxx
    paq8 -6 1.614.xxx
    ofr     1.510.xxx
    sac     1.500.xxx
    
    death2.wav: (+4%) looks good, because this is a "special" file, plain prediction+residual coding gets you something like 1.266.xxx (match-model?)
    paq8 -6 1.138.xxx
    ofr     1.129.xxx
    sac     1.090.xxx
    
    female_speech.wav (+4%), low order file
    paq -6    982.xxx
    ofr       951.xxx
    sac       940.xxx
    
    beautyslept.wav (+20%) this is high-order predictor area
    paq -6  1.566.xxx
    ofr     1.342.xxx
    sac     1.310.xxx
    and btw impressive image compression results
    i found it helpful to mix the sse predictions too, not just using static weights

  22. #1335
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,272
    Thanks
    802
    Thanked 545 Times in 415 Posts
    Here are paq8px_v158 scores for my testset scores, 4 corpuses fir v157 and v158 still in progress.

    @Sebastian - which version of optimfrog are you used to test? Latest I have is 4900ex. Quite good result got also emma 0.1.25x64.
    Best scores for my 0.WAV file:

    1 242 569 optimfrog 4900ex
    1 290 541 emma v0.1.25x64
    1 311 358 winrk 3.03b
    1 311 430 cmix v15d
    1 323 378 paq8px_v157
    1 327 828 paq8pxd v52
    1 335 999 paq8kx_v7



    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	paq8px_v158.jpg 
Views:	48 
Size:	618.0 KB 
ID:	6108  

  23. Thanks:

    Gotty (19th August 2018)

  24. #1336
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    289
    Thanks
    10
    Thanked 34 Times in 22 Posts
    Quote Originally Posted by Darek View Post

    which version of optimfrog are you used to test?
    5.100 "--preset max" which uses some tricky heuristics called "aca" and "acm"

    I doubt anything will be better than ofr or sac in the near future, because even if you plug optimfrog predictors into paq, there's still frame based parameter optimization

  25. Thanks:

    Darek (20th August 2018)

  26. #1337
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,272
    Thanks
    802
    Thanked 545 Times in 415 Posts
    Ok, thanks, I've tested ofr 5100 and "--seek min --mode bestnew" option - it looks the best for my testfile. it got 2 bytes worse score than 4900.
    sac I've got 0.0.6a4 version - but score is 1 394 601 - I need to test it more.

    As I wrote above my latest sac version is 006a4 - is there newer release / bild?

    According to approaching paq to ofr/sac wave compression level - maybe it's possible to plug similar method for audio files to paq?

  27. #1338
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    289
    Thanks
    10
    Thanked 34 Times in 22 Posts
    Quote Originally Posted by Darek View Post
    Ok, thanks, I've tested ofr 5100 and "--seek min --mode bestnew" option - it looks the best for my testfile. it got 2 bytes worse score than 4900.
    sac I've got 0.0.6a4 version - but score is 1 394 601 - I need to test it more.

    As I wrote above my latest sac version is 006a4 - is there newer release / bild?

    According to approaching paq to ofr/sac wave compression level - maybe it's possible to plug similar method for audio files to paq?
    with optimfrog use "--preset max" without additional options
    sac 0.0.6a4 is 10 years old...there's a newer but unpublished build (eta ~2weeks)

    optimfrog and sac optimize the predictor parameters (eg. learning rates) per file/frame. With its current architecture this is not possible with paq.
    Last edited by Sebastian; 20th August 2018 at 09:46.

  28. Thanks:

    Darek (20th August 2018)

  29. #1339
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,272
    Thanks
    802
    Thanked 545 Times in 415 Posts
    Paq8px v157 4 corpuses scores. Progress in all corpuses. Silesia got score below 30'9xx'xxx bytes!
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	4_Corpuses_paq8px_v157.jpg 
Views:	93 
Size:	2.65 MB 
ID:	6109  

  30. Thanks:

    Gotty (20th August 2018)

  31. #1340
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    733
    Thanks
    424
    Thanked 480 Times in 255 Posts

    Pull request: paq8px_v159

    Code:
    - Comments and cosmetic changes in ModelStats
    - Fixed a rare and low-impact overflow ("if (bestLen>=MinLen)") in matchModel
    - More cosmetic changes in matchModel
    - New model for ascii files and files containing ascii fragments: charGroupModel
    Attached Files Attached Files

  32. Thanks (2):

    Darek (20th August 2018),Mike (20th August 2018)

  33. #1341
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,272
    Thanks
    802
    Thanked 545 Times in 415 Posts
    > - New model for ascii files and files containing ascii fragments: charGroupModel
    Looks as very good idea. Question - is this model founded instances reported by paq? As text or ASCIItext?

    I'm asking because I've some ascii files in my testset but there no new communicates during compression. However majority of files got gains for v159 and most of them contains some ascii text parts (even small).
    Then my question about these gains is - are there due to new ascii model or matchModel tweaks and changes?

    Here are also my testset scores for v159.
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	paq8px_v159.jpg 
Views:	72 
Size:	618.3 KB 
ID:	6112  
    Last edited by Darek; 20th August 2018 at 18:05.

  34. Thanks:

    Gotty (20th August 2018)

  35. #1342
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    733
    Thanks
    424
    Thanked 480 Times in 255 Posts
    No, it is an independent model.
    Although there have been similar approaches in wordModel (f4, mask), and different "masks" in TextModel, so there is some overlap, but there is novelty in charGroupModel.
    Words (letter-sequences) and numbers (digit-sequences) are collapsed into one character. So "A (fairly simple) sentence." becomes "A (a a) a.", and "A picture is worth a 1000 words!" becomes "A a a a a 1 a!"
    And the model uses these simplified "strings" as contexts for a ContextMap.

    Edit: a "mask" is a simplification of the last some bytes of the input sequence emphasizing some feature that looks promising to use it as a context.
    Last edited by Gotty; 20th August 2018 at 18:49.

  36. Thanks:

    Darek (20th August 2018)

  37. #1343
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    733
    Thanks
    424
    Thanked 480 Times in 255 Posts
    Quote Originally Posted by Darek View Post
    >Then my question about these gains is - are there due to new ascii model or matchModel tweaks and changes?
    There was only one tiny-tiny fix in the matchModel. That is just 1-2 bytes in the case of only some files.
    So that's the new ascii model (charGroupModel).
    There is no special block type for "ascii" so on the screen you see nothing special. If the block is not image, and not audio, then the charGroupModel works silently in the background.

  38. #1344
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,272
    Thanks
    802
    Thanked 545 Times in 415 Posts
    4 corpuses scores for paq8px v158. Slightly improvements for Silesia, Canterbury and MaximumCompression. Slightly worse score for Calgary.
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	4_Corpuses_paq8px_v158.jpg 
Views:	51 
Size:	2.15 MB 
ID:	6113  

  39. #1345
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    876
    Thanks
    474
    Thanked 175 Times in 85 Posts
    .wav are still not deteted in tracker modules like .xm, .it, .mptm of the SqueezeChart

    https://drive.google.com/file/d/0ByL...ew?usp=sharing
    https://drive.google.com/file/d/0ByL...ew?usp=sharing
    https://drive.google.com/file/d/0ByL...ew?usp=sharing

    .sf2 are detected as audio, but info is not precise:
    http://sonimusicae.free.fr/Banques/S...-Diato-sf2.zip

    This file is detected as one sample in 16-bit mono but it contains 90 samples in 16-bit stereo.

  40. #1346
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    290
    Thanks
    120
    Thanked 168 Times in 124 Posts
    I don't know if paq8px already has this match model: in CMV, since its first version, there is a "sparse match" model, which has a step and an initial gap.
    You can combine step S and initial gap G as you wish.
    E.g. ("x" are the bytes of the context to match, "." is ignored, "y" is the byte to predict):
    Step 3, initial gap 0: ..x..x..xy
    Step 3, initial gap 1: .x..x..x.y
    Step 3, initial gap 2: x..x..x..y
    Step 1, initial gap 0: xxxxxxxxxy
    Step 1, initial gap 1: xxxxxxxx.y
    Step 1, initial gap 2: xxxxxxx..y
    IIRC step 1 with initial gap G are useful e.g. in FP.LOG.

  41. Thanks:

    Gotty (1st September 2018)

  42. #1347
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    733
    Thanks
    424
    Thanked 480 Times in 255 Posts
    In paq8px there is a matchModel and there is a sparseModel.
    matchModel is trying to find long byte sequences in the past that matches the characters preceding the byte to be predicted.
    sparseModel is just using some of the preceding bytes (with gaps) as contexts.
    So you say that "sparse match" would be a combination of these two?

    >>IIRC step 1 with initial gap
    G are useful e.g. in FP.LOG.
    How much is G?

  43. #1348
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    577
    Thanks
    220
    Thanked 833 Times in 341 Posts
    Well, you guys have been busy

    Quote Originally Posted by Gotty View Post
    Thanx, Márcio for fixing that shift. Strange, that it was still good enough or good at all. Now it must be even better.
    I'm also including an even better diffuse hash in the next version (see here). It won't make much difference, but at least it's theoretically better.

    Quote Originally Posted by Sebastian View Post
    and btw impressive image compression results
    i found it helpful to mix the sse predictions too, not just using static weights
    Back at you, impressive audio compression results.
    In EMMA I use a mixer with special contexts to mix the SSE predictions, but for now, to optimize the SSE contexts, it's simpler to test with fixed weights.

    Quote Originally Posted by Sebastian View Post
    optimfrog and sac optimize the predictor parameters (eg. learning rates) per file/frame. With its current architecture this is not possible with paq.
    It's possible, but it's messy. We can use sac as a transform for audio blocks, and then simply skip them after the transform. I think Kaido did something like that in previous versions of paq8pxd that used TTA.

    Quote Originally Posted by Mauro Vezzosi View Post
    I don't know if paq8px already has this match model: in CMV, since its first version, there is a "sparse match" model, which has a step and an initial gap.
    I tried something similar (but not as configurable) in EMMA, but found it only helped sporadically and so wasn't worth including even in the highest complexity level for the match model.

    @Gotty:
    What do you think about merging the new model into the TextModel? It's basically modelling a different text quantization strategy, like the other masks, so it seems logical that it fits into that model. And it may be useful to combine its mask with other info available in the text model. Also, instead of a long sequence of if-else's, why not use a LUT for the first 128 elements? It would make the code cleaner. Oh, and I see someone doesn't like hex constants

    Right now I think one of the priorities should be to improve the exeModel. It's one area where cmix is getting much better results, even though it has the same model, so there's probably still a lot of low-hanging fruit to pick in that area.

  44. #1349
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    290
    Thanks
    120
    Thanked 168 Times in 124 Posts
    Quote Originally Posted by Gotty View Post
    So you say that "sparse match" would be a combination of these two?
    Yes somehow, it's a match model in which the bytes in the sequence are not consecutive, but have a fixed step (I don't know how to explain it better in words).
    I call it "sparse match", you can call it whatever you want (masked bytes match model?).
    You can choose the minimum and maximum lenght of the match as you want.
    The standard match model is "Step 1, initial gap 0: xxxxxxxxxy".
    Code:
    More example ("." is ignored, "y" is the byte to predict):
    |--------------| <-- Characters preceding the byte to be predicted
    abcdefghjkilmnopy
    A match model with step 5 initial gap 2 takes into account only these bytes: ...d....j....n..[y]
    and match the following sequences in the past                              :    d1234j5678n90
                                                                                    dQJRIjVJECnIZ
                                                                                    ^    ^    ^
    In this example the match takes into account 3 bytes just to show you what I mean,
    but you can use the same 9->7->5 of the standard paq8x match model
    (I suggest to use lower values in sparse match)
    Quote Originally Posted by Gotty View Post
    >>IIRC step 1 with initial gap G are useful e.g. in FP.LOG.
    How much is G?
    What you want, it depends by your implementation, IIRC in CMV there isn't any particular limit, however I use a maximum of gap 8 with step 1 and IIRC a gap 30 with step 31.

  45. Thanks:

    Gotty (1st September 2018)

  46. #1350
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    733
    Thanks
    424
    Thanked 480 Times in 255 Posts
    Quote Originally Posted by mpais View Post
    I'm also including an even better diffuse hash in the next version (see here). It won't make much difference, but at least it's theoretically better.
    It looks a promising one, isn't it? That was the very first one I tried with excitement when experimenting, and found it performed slightly inferior to some other ones.
    Then I made an experiment: I hashed different value ranges to different hash table sizes and counted when collisions happen. And visualized them on a heatmap. The "low bias" hash function may be very good for the full range of inputs but the one I included finally was a tiny bit better with lower input values and performed a tiny bit better in paq8px.
    I'm now experimenting with 1) properly spreading low values (combining bytes in a hash) and 2) properly combining "more" hash values without getting too much collisions. I have to tell that the original hash function in paq8px still performs surprisingly well with the "permute" spreading. But the time has come to make things better (or faster).

    Quote Originally Posted by mpais View Post
    What do you think about merging the new model into the TextModel? It's basically modelling a different text quantization strategy, like the other masks, so it seems logical that it fits into that model. And it may be useful to combine its mask with other info available in the text model.
    I first encountered the masks (f4, mask, mask2) in the old wordModel, and they (f4 at least) were a bit illogical. So I thought including mine there replacing one of the old ones.
    Then I thought it would be better to put it in the TextModel (yeah).
    Finally I decided to make a separate model:
    1) I intend to include a binary charGroup as well. That is not as strong as the ascii one, however, and more importantly not finished yet;
    2) I "exported" the feature mask to ModelStats so any model may benefit from it (well, not any, I had the TextModel in mind); and
    3) From object oriented design point of view I'd like to keep objects (functions) as separated as possible that they don't depend too much from each other (decoupling). That's good for maintenance, and allows greater flexibility, interoperability.
    So I'd say let's keep it separated for now. Of course the TextModel should use it.

    Quote Originally Posted by mpais View Post
    Also, instead of a long sequence of if-else's, why not use a LUT for the first 128 elements? It would make the code cleaner.
    Now the code is inefficient, too. When I experimented with the different groups the if-else way looked easier: changing the groups was easier than maintaining a LUT. But now as the groups are tweaked enough and seem to be final, I think a LUT well be the way to go. So I agree and I'll do that.

    Quote Originally Posted by mpais View Post
    Oh, and I see someone doesn't like hex constants :D
    It depends on the semantics ;-) Which one is it?
    Do you mean the 3ff vs 1023? I changed it because in the "doc" it states 1023 and in the code it was 3ff. 3ff is OK when masking, but when it means a "limit" a normal decimal value seems better (semantically). But that's just my preference. The same way I prefer "x*2+1" when the intention is "arithmetic" and "(x<<1) | 1" when it is a bit-concatenation. The result is the same, but the first-time reader catches the purpose behind the operation easier.
    Or do you mean for example "((U64(1)<<60)-1)" in the new model? It means "keep 60 bits". So it's easier to understand vs. 0x0fffffffffffffff. Not like the former is very easy to read at once ;-) But the "60" is there - and it's easy to change. I had an idea to utilize a define: and use it like & BITMASK(60). It is certainly cleaner and it's the best to read out. So I'll do that next time.

Page 45 of 78 FirstFirst ... 35434445464755 ... LastLast

Similar Threads

  1. FrontPAQ - GUI frontend for PAQ8PF and PAQ8PX
    By LovePimple in forum Download Area
    Replies: 26
    Last Post: 17th January 2019, 14:36
  2. Alternative paq8px builds
    By M4ST3R in forum Download Area
    Replies: 20
    Last Post: 25th June 2010, 17:19
  3. Optimized paq7asm.asm code not compatible with paq8px?
    By M4ST3R in forum Data Compression
    Replies: 7
    Last Post: 3rd June 2009, 16:34

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •