# SMAC : SMall Arithmetic Coding

Show 40 post(s) from this thread on one page
Page 3 of 3 First 123
• 23rd May 2013, 09:20
Jean-Marie Barone
Quote:

What I propose (or better: what I have seen in LPAQ IIRC) is to have 5 mixers. Use first one if in step 1 you collected only 1 prediction. Use second on if in step 1 you collected 2 predictions. And so on, finally use fifth one if in step 1 you collected 5 predictions (ie all models had available predictions).
Now I get it : this would avoid the problem of high-order models that have not collected any statistics yet, and are initialized with a probability of 1/2. Thanks for the tip! Sorry if I seem a bit slow of the mark, I'm a bit confused with the terminology sometimes.

A match model could be a good solution to discard long matches. I have to study this.
• 23rd May 2013, 14:33
Piotr Tarsa
Quote:

Both of those techniques are used in various paq8 (but I think not lpaq, or possible in zpaq).
OK, probably I mixed up something.

Quote:

One problem with disabling models for long matches is that those models will not see most of the match and then predict the next byte badly because it doesn't see the immediate context.
Really? I got an impression that you don't update the contexts nor rebuild them, so in that case that could be true. But for simple order-n models it should be sufficient to track the last max(all models orders) bytes and then rebuild the contexts for all models on mismatch.

Quote:

As for using the context match length as mixer context, it does help, but I'm not sure if it works any better than just using an order 0 or 1 context to select the mixer weight vector, like is normally done in zpaq. It might.
You mean the highest order index or long match length? I think highest order index (if the limit is relatively small, like single digit) should be the best single context for selecting mixer.

Quote:

Now I get it : this would avoid the problem of high-order models that have not collected any statistics yet, and are initialized with a probability of 1/2.
Yep, mainly. Additionally, it could also have other implications - data compression is so unpredictable that you can't reliably predict when something will help or damage compression :]

Quote:

A match model could be a good solution to discard long matches.
More precisely - it would be good for avoiding many reduntant (well, not 100% reduntant, but practically useless) computations.
In general removing repetitions doesn't change the load on hash table as repetitions do not add any new entries to hash table - the entries are already here. On the other hand, removing repetitions could help avoid problems like overfitting, I think.
• 23rd May 2013, 19:09
Matt Mahoney
> You mean the highest order index or long match length? I think highest order index (if the limit is relatively small, like single digit) should be the best single context for selecting mixer.

Probably so. I think that's why an ICM-ISSE chain works better than mixing different order ICMs. It effectively does the same thing as using the context order as a mixing context because at each step in the chain, an empty bit history selects a pair of weights to modify the prediction as p := w1*p + w2 in the logistic domain. Initially, w1 = 1, w2 = 0. If the proper action is to emphasize the lower order prediction, then the ICM will learn this by increasing w1.

> But for simple order-n models it should be sufficient to track the last max(all models orders) bytes and then rebuild the contexts for all models on mismatch.

That would probably work, but PAQ doesn't do that. There would be a speed penalty, of course.
• 31st July 2013, 04:16
Jean-Marie Barone
SMAC 1.16: Bitwise, Logistic Mixing, ICM, Orders-6 to 1 & Match Model
● A context seen for the first time points to a Bit history value of 0. With version 1.16, the couples of Frequencies linked to this Bit history entry will not be updated, so its probability will remain p(0)=1/2. Therefore, because Stretch(p(0))=Log2(0.5/0.5)=0, contexts seen for the first time will have no impact on the global prediction.

● The Match model is triggered on a byte boundary. It searches, on encoding, the Source buffer history (data read so far) to find the same pattern as the current Context (8 bytes). On decoding, it will search the Destination buffer (data written so far).
The bits following that pattern will be considered as next prediction bits until a mismatch occurs, with probabilities of p(0)=65535/65536 if bit 0 is predicted, and p(0)=1/65536 if bit 1 is predicted.
Eventually, that prediction will be included in the Logistic mixing process, along with Orders-6 to 1 predictions.
The Match model uses a hash-table of 512 Kilobytes (65535 entries of 8 bytes) to find in the buffer a pattern matching the current Context.
Each entry of the hash-table is designed like this :
http://fastpack.free.fr/MatchTable.PNG
The 64 bits of the Context are hashed to a 16 bits value. Because collisions are very likely, the low-half of the current Context will be compared with the 2nd double-word of the Match hash-table, which contains the low-half of the last Context having produced the same hash.
If both values differ, a collision is detected and the Match model will terminate with a Match Prediction value of 2, meaning that no prediction could be made, and a Match Offset value of 0, meaning the model is turned "off".
If both values are equal, Match Offset value will be filled with the first double-word of the Match hash-table entry, which contains the offset in the buffer of the next byte that follows the same context. Match Prediction will be filled with the most significant bit of that byte.
In all cases (either a prediction could be carried out or not), the entry of the Match hash-table will be updated with respectively the offset of the byte following the current Context, and its low double-word value.
During the next bit encoding/decoding process, the procedure will know that a Match model is "on" by testing the Match Offset variable which will be different from 0, and will fetch the next bit of Match Prediction directly from that address. If that prediction turns out to be wrong, Match Offset will be set to 0, and a new prediction will be calculated on the next byte boundary.

Results on enwik8 & enwik9 are 21831822 & 183551384.
• 31st July 2013, 15:41
Matt Mahoney
• 2nd November 2013, 06:22
Jean-Marie Barone
SMAC 1.17 : Optimized for speed
After looking over Intel & Agner Fog’s documentation regarding Core Duo processors & above, I endeavoured to modify the code in order to give a boost to compression & decompression times.

I have discarded many instructions greater than 1 µop or with high latencies, reorganized them to favor a 4-1-1-1 decoding scheme and, more generally, I tried to take profit from latest Intel procs novelties like Store-forwarding, loopback buffer, zero-latency movs, macro-fusions, etc…

The FNV-1 hash for Contexts has also been overhauled, so that the Order-4 hash uses Order-3 hash as a partial result, and likewise for Order-6 hash.

There is no improvement in the compression engine itself, still the results are a tad better due to a change in the rounding method of the Bit History counters. Also the lookup table for 232/(n0+n1) now uses floating-point values instead of integers previously.

Results on enwik8 & enwik9 are 21816272 & 183459153.
• 4th November 2013, 17:38
Matt Mahoney
Nice improvements. http://mattmahoney.net/dc/text.html#1834

Edit: also, some improvement in the Silesia benchmark: http://mattmahoney.net/dc/silesia.html
• 18th November 2013, 06:15
Jean-Marie Barone
SMAC 1.17A : more speed optimization
I managed to gain 20 seconds more on enwik8 with my Core 2 Duo Merom, essentially by getting rid of the slow FSCALE instruction in the Squash function, while calculating 2-p.
I also replaced some FPU instructions with faster ones, plus a few more optimization tricks.
Compression results remain identical as previous version.
• 19th November 2013, 18:07
Matt Mahoney
• 9th December 2013, 06:03
Jean-Marie Barone
SMAC 1.18 : optimized for speed (part III, final?)
* The Squash function has been entirely rebuilt using a chain of polynomial approximations (3rd degree, 32 intervals, step=1).
I've traded a bit of compression for the benefit of speed : -40 seconds on enwik8 on my Core 2 Duo.

* Until recently, I was not aware that XMM registers were considered as volatile when calling an API.
On Windows 64 bits, it was possible (but not confirmed) that calls to ReadFile & WriteFile destroy xmm0 & xmm2, which kept the Context and a mask for a hashtable.
Ergo, I have replaced xmm0 & xmm2 by xmm6 & xmm7, which are preserved on 64 bits.

The Squash function now uses a SSE3 instruction (fisttp). Since SSE3 was introduced in 2004(Intel) & 2005(AMD), I've assumed that nowadays it is active on every PCs (In other words, I've been too lazy to check for SSE3 compatibility).

The (minimax) polynomial tables for the Squash function are the work & idea of qWord : thanks to him, and all MASM32 forum members for their support!

Results on enwik8 & enwik9 are 21816285 & 183459860.
• 10th December 2013, 16:03
Matt Mahoney
• 18th December 2013, 03:40
Jean-Marie Barone
SMAC 1.19 : yet another speed improvement
• I have figured out that my Stretch function could be radically improved.
Indeed, if we substitute p0=(n0/(n0+n1))*232 in Stretch(p0)=log2(p0/(232-p0)), we end up with :
Stretch(p0)=log2(n0/n1)=log2 n0 – log2 n1
with n0=bit 0 count, n1=bit 1 count, p0=prediction that next bit will be 0, and ni ranges between [1;65535]

Thus, there was no need to calculate p0 & p1. Compression of enwik8 is now less than 5 min. under my Core 2 Duo.
• The Contexts hashes are now calculated earlier in order to increase the Prefetch Scheduling Distance : prefetches instructions benefit from this additional leeway to complete successfully.

Results on enwik8 & enwik9 are 21816323 & 183459942.
• 19th December 2013, 19:12
Matt Mahoney
• 16th January 2014, 17:28
Jean-Marie Barone
SMAC 1.20 : using Laplace estimator
I have switched my prediction formula p0=n0/(n0+n1) with Laplace’s rule of succession, which gives better results in estimating the probability of next symbol.
From now on, p0=(n0+1)/(n0+n1+2),and Stretch(p0)=log2(n0+1)-log2(n1+1).
I made some experiments with Krichevski-Trofimov estimator, where
p0=(n0+0.5)/(n0+n1+1), and Stretch(p0)=log2(2n0+1)-log2(2n1+1), but results were not as good.
One positive side-effect of Laplace estimator is that it solves the Zero-frequency problem : with previous formula, I had to initialize n0 and n1 counts to one to avoid a division by zero. Now, the frequencies can range from [0;65535].

The Match model has been improved. The bit following a matching pattern of 8 bytes is now predicted with a probability of p0=1-(1/x) if bit 0 is predicted, and p0=1/x if bit 1 is predicted, with x=64 and x=x+1 after each correct prediction.

Results on enwik8 & enwik9 are 21781544 & 183190888.
• 16th January 2014, 18:53
Matt Mahoney
Results for Silesia corpus.

Code:

```  Silesia dicke mozil  mr  nci ooff  osdb reym samba  sao webst x-ray  xml Compressor -options --------- ----- ----- ---- ----- ---- ----- ---- ----- ---- ----- ----- ---- -------------------  48093438  2322 16009 2446  1787 2270  2725 1104  3596 4892  6458  4078  400 smac 1.20  48253662  2331 16034 2453  1819 2271  2733 1111  3608 4902  6489  4093  403 smac 1.17  48254125  2331 16035 2453  1819 2271  2733 1111  3608 4902  6489  4093  404 smac 1.19  48254226  2331 16035 2453  1819 2271  2733 1111  3608 4902  6489  4093  404 smac 1.18  48272013  2333 16035 2454  1823 2271  2733 1112  3609 4903  6495  4093  404 smac 1.16```
Edit: LTCB results. http://mattmahoney.net/dc/text.html#1831
• 3rd June 2018, 02:45
Jean-Marie Barone
SMAC 1.22 : Secondary Symbol Estimation
● SSE improves the prediction by collecting further statistics in a 2D-table, featuring an Order-2 context and a stretched prediction ranging from -16 to +16 (33 boundaries).
My method roughly follows the one described by Matt in the latest PAQ archivers.
Each entry of the table stores a pair of 16 bit counters reflecting the observed bits for a given context & prediction.
From these counters, a new stretched prediction will be calculated.

http://fastpack.free.fr/sse.png

Like with PAQ, both counters will be halved when one of them is saturated, and initial counters are set so that each prediction maps to itself.
The SSE prediction is calculated as a linear interpolation between above & below boundaries, and the final prediction is equal to 3/4 SSE prediction+1/4 Initial prediction.
The closest entry to the linear interpolation will be updated when the result is known.

● With hindsight, it appeared to me that my Match model was overly complicated.
Henceforth, each entry of the Match table is a simple doubleword containing the offset of a matching pattern, given current context (8 bytes).
Both contexts (current context & Matching pattern) are still compared to detect hash collisions.
Also, I have increased the memory allocated for the table.

● FPU instructions have been replaced by SIMD ones (SSE & SSE2), except in the creation of the Logarithms lookup table.

● Some optimization in the code.

● Source code has been ported to UASM.

Results on enwik8 & enwik9 are 21047188 & 176607820.
• 12th July 2018, 17:22
Jean-Marie Barone
SMAC 1.23 : 3D SSE
This version uses a 3D-table SSE which inputs:
- a stretched prediction from -16 to +16
- an Order-2 context
- the current bit rank (8 segments from 0 to 7)

Some good results can also be achieved using only 2 segments (high & low nibble), or 4 segments.
Bit rank is a useful piece of information; including it in the Modeling improves overall prediction.

Results on enwik8 & enwik9 are 20930754 & 175759034.
Results on Silesia Corpus :

Code:

```                Total                dickens        mozilla        mr        nci        ooff        osdb        reym        samba        sao        webst        x-ray        xml SMAC 1.21        47822648        2319        15829        2445        1787        2261        2725        1102        3543        4892        6436        4078        399 SMAC 1.22        46544819        2271        15421        2401        1593        2233        2608        1060        3479        4849        6242        4000        382 SMAC 1.23        45592332        2269        15002        2285        1589        2147        2467        1057        3485        4789        6234        3883        379```
• 13th July 2018, 06:04
Gonzalo
Thanks for sharing this, Jean-Marie! I see it fits good the use case of text-alike data. It even outperforms ppmd on my test set (although at the cost of speed).

Now, could you elaborate a little on memory requirements? I aim to try it on a few big files but I don't want to risk halting the system. Thanks in advance!
• 13th July 2018, 13:17
Jean-Marie Barone
It uses a lot of memory, but is still below its 2 GB limit of adress space. Still, I don't recommend using it with files greater than 1 GB as it would be to slow & impractical. You have some better alternatives if you wish to perform some serious backups.

Edit : Also, I forgot to mention SMAC was not designed to compress/decompress files longer than 4 GB, as it uses a 32 bit variable to store the length of the file.

Code:

```LOCAL FileLength:DWORD call        GetFileSize,@@SourceHandle,0         mov        FileLength,eax                ;store original size```
• 14th July 2018, 20:56
Gonzalo
Thanks for the info. I was just giving it a try to know its limits, no serious backup involved. It seems the method is symmetric, so it has no practical value to me as a day-to-day program, but it's good to see new ideas put to work!
Show 40 post(s) from this thread on one page
Page 3 of 3 First 123