Thanx Sebastian - that seems to be a bug report.
Unfortunately I don't have working knowledge in that area.
Do you happen to know where the problem is (in the code), or you just experienced that some audio content is not detected properly?
None of the testfiles from rarewares.org get detected properly
The bug is between line 9455 and 9498, the file detection code is a mess
I think it is this line
Code:
if (p==16+wavlen && (buf1!=0x666d7420 || bswap(buf0)!=16)) wavlen=((bswap(buf0)+1)&(-2))+8, wavi*=(buf1==0x666d7420 && bswap(buf0)!=16);
The fmt-chunk size is 18 for wave-extensible files, so adding "bswap(buf0)!=18" should fix it
Couldn't agree more.
It's on my long-term to-do list to fix the readibility and maintainability there. Unfortunately I do not have the proper knowledge (yet) to do so.
Thanx for the fix-suggestion. I'll try that when I get back to my pc.
>However it's not perfect: not every file is happy, and himself too was even better in an earlier version (v152 was: 7 885 780). I'll need to find an equilibrium. Keep fingers crossed.
That's normal - some files could be "sacrificed" if the total compression or most of files will improve...
According to mozilla file - paq8pxd best score (for v48 ) is 7 384 310 - 500KB less due to improved parsers (f.e. DEC). Maybe this is a way to improve mozilla compression even more - to find "unserviced" types of standards/files.
By the way - I've add best score highlights to the table and minimum size column with summary of best scores = theoretical paq8px minimum .
Nice catch, in EMMA the models first predict() and then update(), and when merging everything I was focused on getting fast mode to work and forgot about that detail.
Originally Posted by Gotty
Code:
...
- the default context was c0, now it is 0 (indicating "no match" or "mismatch")
- skipping mixing (add(0)) when state is empty (unknown context).
...
Did you test these changes in isolation, i.e., without the other small changes, to see their effect on compression?
The reason I left the default context at c0 was because then the mixer would probably still keep the weight for those inputs as somewhat useful, since they might only be in use when we have a match or are in delta mode.
The improvements reported from Darek are really small, and might just be because of the fixed delta mode with the new context or the move to using the normalModel for special data types too.
As for mozilla, the degradation from v153 onwards is most likely because of the additional block segmentation information because of the new text detection.
Originally Posted by Sebastian
None of the testfiles from rarewares.org get detected properly
The bug is between line 9455 and 9498, the file detection code is a mess
I think it is this line
Code:
if (p==16+wavlen && (buf1!=0x666d7420 || bswap(buf0)!=16)) wavlen=((bswap(buf0)+1)&(-2))+8, wavi*=(buf1==0x666d7420 && bswap(buf0)!=16);
The fmt-chunk size is 18 for wave-extensible files, so adding "bswap(buf0)!=18" should fix it
That won't do it. Here, this should fix it:
if (p==16+wavlen && (buf1!=0x666d7420 || ((wavm=bswap(buf0)-16)&0xFFFFFFFD)!=0)) wavlen=((bswap(buf0)+1)&(-2))+8, wavi*=(buf1==0x666d7420 && (wavm&0xFFFFFFFD)!=0);
I'll include this fix in the next version.
Originally Posted by Gotty
Couldn't agree more.
It's on my long-term to-do list to fix the readibility and maintainability there. Unfortunately I do not have the proper knowledge (yet) to do so.
Thanx for the fix-suggestion. I'll try that when I get back to my pc.
The code being a mess and limiting was one of the reasons I started Fairytale, to build something better. As-is, I'd have to make things a lot messier just to port the parsers from EMMA.
what about adding an ordinary least squares predictor to the image model?
the code is already in WAV_MODEL but i could provide a cleaner version. You just use some neighborhood pixels and adaptively optimize the coefficients.
Sure using some neighborhood as context already is a variant of this but either it simplifies the model or improves compression instead of using some fixed weight predictors.
Did you test these changes in isolation, i.e., without the other small changes, to see their effect on compression?
The reason I left the default context at c0 was because then the mixer would probably still keep the weight for those inputs as somewhat useful, since they might only be in use when we have a match or are in delta mode.
Yes. All changes in code brought improvements here and there. I had the idea of skipping prediction when the matchModel has no clue, so the weights are kept intact in the Mixer until the matchModel kicks in again. And it worked.
Originally Posted by mpais
As for mozilla, the degradation from v153 onwards is most likely because of the additional block segmentation information because of the new text detection
Luckily I already know that the mozilla-degradation in v156 is caused by the "add(0)"s in ContextMap. Most if not all small files had benefit from this change. The large files in Silesia didn't. Now, I (think I) know why. state==0 is not useful for many files: the mixer sees predictions from state==0 just garbage: new contexts are usually different from each other, but there are some files when new contexts (state==0) have some similarities and it is useful for them to keep the predictions mixing for state==0. It looks like mozilla is one of them.
Unfortunately I tested with the smaller corpuses only, and so could not see what happens to Silesia meanwhile. Silesia is a bit big and takes a night for me to run one round.
Originally Posted by mpais
I'll include this fix in the next version.
Thanx for that.
In my next version I'm bringing more improvements to the matchModel, and fix the add(0) problem mentioned above. Smaller corpuses are tested and are happy, Silesia will be on the run after a couple of hours. Then I'll post. Did you begin a new version yourself?
Edit: Ah, forgot one thing: thank you so much for the excellent matchModel! The best models are those that are flexible and easy to improve. And this is such a model.
what about adding an ordinary least squares predictor to the image model?
the code is already in WAV_MODEL but i could provide a cleaner version. You just use some neighborhood pixels and adaptively optimize the coefficients.
Sure using some neighborhood as context already is a variant of this but either it simplifies the model or improves compression instead of using some fixed weight predictors.
Sure, go for it, but I doubt you'll see significant compression improvements from it.
And right now I'd say the weakest point of the model is with artificial images, like graphics and/or text, where the sharp discontinuities make most linear predictors useless and we need to rely on the texture descriptors of the contextMap.
Originally Posted by Gotty
Yes. All changes in code brought improvements here and there. I had the idea of skipping prediction when the matchModel has no clue, so the weights are kept intact in the Mixer until the matchModel kicks in again. And it worked.
Cool, then I don't need to run any tests. And nice to see that we had similar ideas and yet tried different approaches.
Originally Posted by Gotty
In my next version I'm bringing more improvements to the matchModel, and fix the add(0) problem mentioned above. Smaller corpuses are tested and are happy, Silesia will be on the run after a couple of hours. Then I'll post. Did you begin a new version yourself?
My internal version is usually always 3 or 4 versions ahead of the code I publish, I just remove what I still haven't finished testing/improving and publish what I've got.
For the next version I'll probably keep improving the SSE stage, since you want to work on the mixer. No point in improving the mixing stage if the SSE stage is weak.
Originally Posted by Gotty
Edit: Ah, forgot one thing: thank you so much for the excellent matchModel! The best models are those that are flexible and easy to improve. And this is such a model.
Well, it was just a quick hack, I honestly didn't expect much improvement from it.
On my (way, way too long) to-do list is a 2D fuzzy match model for the image models, to see if it can help on what I described above.
And right now the 8bpp image model already looks for horizontal symmetries, i.e., it looks for mirrored features, which we could also use on the 24/32bpp image model.
Parameters are:
n=number of predictors
kmax=update interval (default at 1, larger is faster)
lambda=decay rate of the covariance estimation (default=0.998 )
nu=Tikhonov regularization parameter (default=0.001)
Improvements could be made in that pixels farther away should be weighted less (e.g. manhatten distance) - edges are also a problem
// general linear model of the form p=w1*p1+w2*p2+...+wn*pn
template <class T>
class OLS {
typedef std::vector<T> vec1D;
typedef std::vector<vec1D> vec2D;
const T ftol=1E-8;
public:
OLS(int n,int kmax=1,T lambda=0.998,T nu=0.001)
:n(n),kmax(kmax),lambda(lambda),nu(nu),
x(n),w(n),b(n),mcov(n,vec1D(n)),mchol(n,vec1D(n))
{
km=0;
}
T Predict(const vec1D &p) {
x=p;
T sum=0.;
for (int i=0;i<n;i++) sum+=w[i]*x[i];
return sum;
}
void Update(T val)
{
for (int j=0;j<n;j++)
for (int i=0;i<n;i++) mcov[j][i]=lambda*mcov[j][i]+(1.0-lambda)*(x[j]*x[i]);
for (int i=0;i<n;i++) b[i]=lambda*b[i]+(1.0-lambda)*(x[i]*val);
km++;
if (km>=kmax) {
if (!Factor(mcov)) Solve(b,w);
km=0;
}
}
private:
int Factor(const vec2D &mcov)
{
mchol=mcov; // copy the matrix
for (int i=0;i<n;i++) mchol[i][i]+=nu;
for (int i=0;i<n;i++) {
for (int j=0;j<i;j++) {
T sum=mchol[i][j];
for (int k=0;k<j;k++) sum-=(mchol[i][k]*mchol[j][k]);
mchol[i][j]=sum/mchol[j][j];
}
T sum=mchol[i][i];
for (int k=0;k<i;k++) sum-=(mchol[i][k]*mchol[i][k]);
if (sum>ftol) mchol[i][i]=sqrt(sum);
else return 1; // matrix indefinit
}
return 0;
}
void Solve(const vec1D &b,vec1D &sol)
{
for (int i=0;i<n;i++) {
T sum=b[i];
for (int j=0;j<i;j++) sum-=(mchol[i][j]*sol[j]);
sol[i]=sum/mcov[i][i];
}
for (int i=n-1;i>=0;i--) {
T sum=sol[i];
for (int j=i+1;j<n;j++) sum-=(mchol[j][i]*sol[j]);
sol[i]=sum/mcov[i][i];
}
}
int n,kmax,km;
T lambda,nu;
vec1D x,w,b;
vec2D mcov,mchol;
};
Last edited by Sebastian; 18th August 2018 at 23:02.
- New hash functions (currently used in matchModel only)
- Compression enhancement: "skipping mixing (add(0)) when state is empty (unknown context)" from v156 is now "skipping some mixing when state is empty (unknown context)"
- Compression and speed enhancement: eliminated a useless prediction from ContextMap and ContextMap2
- Tiny compression enhancement: rounding in ContextMap and ContextMap2 (1-2-byte gain only)
- "c8" in exeModel is now global - it may be referenced from other models. Unfortunately using it in matchModel would yield only a very tiny speedup for the cost of more code. Therefore, this matchModel change is not committed.
- Enhancements in matchModel:
- using a bit more diffuse hash calculation for less collision
- now 3 sequences are monitored (lengths: 9, 7, 5)
- matching is performed all the way (not just for a minimal length of 2, but for 9->7->5)
- matching expansion didn't work out; the code is provided (commented out), it may just need some tweaking
- Miscellaneous cleanup in code/comments
Wavmodel / Wav detection is untouched.
Mozilla feels better and is sending his regards.
Notes:
I consolidated most of the MatchModel changes from v157 to make it easier to tweak, and made a few small changes. I configured the parameters to closely match v157, so don't expect many changes.
@Sebastian
To test on the files from rarewaves.org you'll first have to unpack them with WavPack, otherwise the audio model will just be compressing already compressed data.
Next time I port changes from paq8 to cmix I'll see about fixing the detection there too, so you can test the improvement that the LSTM network gives over just the model.
atrain.wav (+7%), using the same predictor as paq, sac gets something like 1.565.xxx
paq8 -6 1.614.xxx
ofr 1.510.xxx
sac 1.500.xxx
death2.wav: (+4%) looks good, because this is a "special" file, plain prediction+residual coding gets you something like 1.266.xxx (match-model?)
paq8 -6 1.138.xxx
ofr 1.129.xxx
sac 1.090.xxx
female_speech.wav (+4%), low order file
paq -6 982.xxx
ofr 951.xxx
sac 940.xxx
beautyslept.wav (+20%) this is high-order predictor area
paq -6 1.566.xxx
ofr 1.342.xxx
sac 1.310.xxx
and btw impressive image compression results
i found it helpful to mix the sse predictions too, not just using static weights
Here are paq8px_v158 scores for my testset scores, 4 corpuses fir v157 and v158 still in progress.
@Sebastian - which version of optimfrog are you used to test? Latest I have is 4900ex. Quite good result got also emma 0.1.25x64.
Best scores for my 0.WAV file:
5.100 "--preset max" which uses some tricky heuristics called "aca" and "acm"
I doubt anything will be better than ofr or sac in the near future, because even if you plug optimfrog predictors into paq, there's still frame based parameter optimization
Ok, thanks, I've tested ofr 5100 and "--seek min --mode bestnew" option - it looks the best for my testfile. it got 2 bytes worse score than 4900.
sac I've got 0.0.6a4 version - but score is 1 394 601 - I need to test it more.
As I wrote above my latest sac version is 006a4 - is there newer release / bild?
According to approaching paq to ofr/sac wave compression level - maybe it's possible to plug similar method for audio files to paq?
Ok, thanks, I've tested ofr 5100 and "--seek min --mode bestnew" option - it looks the best for my testfile. it got 2 bytes worse score than 4900.
sac I've got 0.0.6a4 version - but score is 1 394 601 - I need to test it more.
As I wrote above my latest sac version is 006a4 - is there newer release / bild?
According to approaching paq to ofr/sac wave compression level - maybe it's possible to plug similar method for audio files to paq?
with optimfrog use "--preset max" without additional options
sac 0.0.6a4 is 10 years old...there's a newer but unpublished build (eta ~2weeks)
optimfrog and sac optimize the predictor parameters (eg. learning rates) per file/frame. With its current architecture this is not possible with paq.
Last edited by Sebastian; 20th August 2018 at 09:46.
- Comments and cosmetic changes in ModelStats
- Fixed a rare and low-impact overflow ("if (bestLen>=MinLen)") in matchModel
- More cosmetic changes in matchModel
- New model for ascii files and files containing ascii fragments: charGroupModel
> - New model for ascii files and files containing ascii fragments: charGroupModel
Looks as very good idea. Question - is this model founded instances reported by paq? As text or ASCIItext?
I'm asking because I've some ascii files in my testset but there no new communicates during compression. However majority of files got gains for v159 and most of them contains some ascii text parts (even small).
Then my question about these gains is - are there due to new ascii model or matchModel tweaks and changes?
No, it is an independent model.
Although there have been similar approaches in wordModel (f4, mask), and different "masks" in TextModel, so there is some overlap, but there is novelty in charGroupModel.
Words (letter-sequences) and numbers (digit-sequences) are collapsed into one character. So "A (fairly simple) sentence." becomes "A (a a) a.", and "A picture is worth a 1000 words!" becomes "A a a a a 1 a!"
And the model uses these simplified "strings" as contexts for a ContextMap.
Edit: a "mask" is a simplification of the last some bytes of the input sequence emphasizing some feature that looks promising to use it as a context.
>Then my question about these gains is - are there due to new ascii model or matchModel tweaks and changes?
There was only one tiny-tiny fix in the matchModel. That is just 1-2 bytes in the case of only some files.
So that's the new ascii model (charGroupModel).
There is no special block type for "ascii" so on the screen you see nothing special. If the block is not image, and not audio, then the charGroupModel works silently in the background.
I don't know if paq8px already has this match model: in CMV, since its first version, there is a "sparse match" model, which has a step and an initial gap.
You can combine step S and initial gap G as you wish.
E.g. ("x" are the bytes of the context to match, "." is ignored, "y" is the byte to predict):
Step 3, initial gap 0: ..x..x..xy
Step 3, initial gap 1: .x..x..x.y
Step 3, initial gap 2: x..x..x..y
Step 1, initial gap 0: xxxxxxxxxy
Step 1, initial gap 1: xxxxxxxx.y
Step 1, initial gap 2: xxxxxxx..y
IIRC step 1 with initial gap G are useful e.g. in FP.LOG.
In paq8px there is a matchModel and there is a sparseModel.
matchModel is trying to find long byte sequences in the past that matches the characters preceding the byte to be predicted.
sparseModel is just using some of the preceding bytes (with gaps) as contexts.
So you say that "sparse match" would be a combination of these two?
>>IIRC step 1 with initial gap G are useful e.g. in FP.LOG.
How much is G?
Thanx, Márcio for fixing that shift. Strange, that it was still good enough or good at all. Now it must be even better.
I'm also including an even better diffuse hash in the next version (see here). It won't make much difference, but at least it's theoretically better.
Originally Posted by Sebastian
and btw impressive image compression results
i found it helpful to mix the sse predictions too, not just using static weights
Back at you, impressive audio compression results.
In EMMA I use a mixer with special contexts to mix the SSE predictions, but for now, to optimize the SSE contexts, it's simpler to test with fixed weights.
Originally Posted by Sebastian
optimfrog and sac optimize the predictor parameters (eg. learning rates) per file/frame. With its current architecture this is not possible with paq.
It's possible, but it's messy. We can use sac as a transform for audio blocks, and then simply skip them after the transform. I think Kaido did something like that in previous versions of paq8pxd that used TTA.
Originally Posted by Mauro Vezzosi
I don't know if paq8px already has this match model: in CMV, since its first version, there is a "sparse match" model, which has a step and an initial gap.
I tried something similar (but not as configurable) in EMMA, but found it only helped sporadically and so wasn't worth including even in the highest complexity level for the match model.
@Gotty:
What do you think about merging the new model into the TextModel? It's basically modelling a different text quantization strategy, like the other masks, so it seems logical that it fits into that model. And it may be useful to combine its mask with other info available in the text model. Also, instead of a long sequence of if-else's, why not use a LUT for the first 128 elements? It would make the code cleaner. Oh, and I see someone doesn't like hex constants
Right now I think one of the priorities should be to improve the exeModel. It's one area where cmix is getting much better results, even though it has the same model, so there's probably still a lot of low-hanging fruit to pick in that area.
So you say that "sparse match" would be a combination of these two?
Yes somehow, it's a match model in which the bytes in the sequence are not consecutive, but have a fixed step (I don't know how to explain it better in words).
I call it "sparse match", you can call it whatever you want (masked bytes match model?).
You can choose the minimum and maximum lenght of the match as you want.
The standard match model is "Step 1, initial gap 0: xxxxxxxxxy".
Code:
More example ("." is ignored, "y" is the byte to predict):
|--------------| <-- Characters preceding the byte to be predicted
abcdefghjkilmnopy
A match model with step 5 initial gap 2 takes into account only these bytes: ...d....j....n..[y]
and match the following sequences in the past : d1234j5678n90
dQJRIjVJECnIZ
^ ^ ^
In this example the match takes into account 3 bytes just to show you what I mean,
but you can use the same 9->7->5 of the standard paq8x match model
(I suggest to use lower values in sparse match)
Originally Posted by Gotty
>>IIRC step 1 with initial gap G are useful e.g. in FP.LOG.
How much is G?
What you want, it depends by your implementation, IIRC in CMV there isn't any particular limit, however I use a maximum of gap 8 with step 1 and IIRC a gap 30 with step 31.
I'm also including an even better diffuse hash in the next version (see here). It won't make much difference, but at least it's theoretically better.
It looks a promising one, isn't it? That was the very first one I tried with excitement when experimenting, and found it performed slightly inferior to some other ones.
Then I made an experiment: I hashed different value ranges to different hash table sizes and counted when collisions happen. And visualized them on a heatmap. The "low bias" hash function may be very good for the full range of inputs but the one I included finally was a tiny bit better with lower input values and performed a tiny bit better in paq8px.
I'm now experimenting with 1) properly spreading low values (combining bytes in a hash) and 2) properly combining "more" hash values without getting too much collisions. I have to tell that the original hash function in paq8px still performs surprisingly well with the "permute" spreading. But the time has come to make things better (or faster).
Originally Posted by mpais
What do you think about merging the new model into the TextModel? It's basically modelling a different text quantization strategy, like the other masks, so it seems logical that it fits into that model. And it may be useful to combine its mask with other info available in the text model.
I first encountered the masks (f4, mask, mask2) in the old wordModel, and they (f4 at least) were a bit illogical. So I thought including mine there replacing one of the old ones.
Then I thought it would be better to put it in the TextModel (yeah).
Finally I decided to make a separate model:
1) I intend to include a binary charGroup as well. That is not as strong as the ascii one, however, and more importantly not finished yet;
2) I "exported" the feature mask to ModelStats so any model may benefit from it (well, not any, I had the TextModel in mind); and
3) From object oriented design point of view I'd like to keep objects (functions) as separated as possible that they don't depend too much from each other (decoupling). That's good for maintenance, and allows greater flexibility, interoperability.
So I'd say let's keep it separated for now. Of course the TextModel should use it.
Originally Posted by mpais
Also, instead of a long sequence of if-else's, why not use a LUT for the first 128 elements? It would make the code cleaner.
Now the code is inefficient, too. When I experimented with the different groups the if-else way looked easier: changing the groups was easier than maintaining a LUT. But now as the groups are tweaked enough and seem to be final, I think a LUT well be the way to go. So I agree and I'll do that.
Originally Posted by mpais
Oh, and I see someone doesn't like hex constants :D
It depends on the semantics ;-) Which one is it?
Do you mean the 3ff vs 1023? I changed it because in the "doc" it states 1023 and in the code it was 3ff. 3ff is OK when masking, but when it means a "limit" a normal decimal value seems better (semantically). But that's just my preference. The same way I prefer "x*2+1" when the intention is "arithmetic" and "(x<<1) | 1" when it is a bit-concatenation. The result is the same, but the first-time reader catches the purpose behind the operation easier.
Or do you mean for example "((U64(1)<<60)-1)" in the new model? It means "keep 60 bits". So it's easier to understand vs. 0x0fffffffffffffff. Not like the former is very easy to read at once ;-) But the "60" is there - and it's easy to change. I had an idea to utilize a define: and use it like & BITMASK(60). It is certainly cleaner and it's the best to read out. So I'll do that next time.