Page 3 of 3 FirstFirst 123
Results 61 to 80 of 80

Thread: SMAC : SMall Arithmetic Coding

  1. #61
    Member Jean-Marie Barone's Avatar
    Join Date
    Oct 2012
    Location
    Versailles, France
    Posts
    60
    Thanks
    15
    Thanked 4 Times in 3 Posts
    What I propose (or better: what I have seen in LPAQ IIRC) is to have 5 mixers. Use first one if in step 1 you collected only 1 prediction. Use second on if in step 1 you collected 2 predictions. And so on, finally use fifth one if in step 1 you collected 5 predictions (ie all models had available predictions).
    Now I get it : this would avoid the problem of high-order models that have not collected any statistics yet, and are initialized with a probability of 1/2. Thanks for the tip! Sorry if I seem a bit slow of the mark, I'm a bit confused with the terminology sometimes.

    A match model could be a good solution to discard long matches. I have to study this.
    Last edited by Jean-Marie Barone; 23rd May 2013 at 10:39. Reason: typo

  2. #62
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    Both of those techniques are used in various paq8 (but I think not lpaq, or possible in zpaq).
    OK, probably I mixed up something.

    One problem with disabling models for long matches is that those models will not see most of the match and then predict the next byte badly because it doesn't see the immediate context.
    Really? I got an impression that you don't update the contexts nor rebuild them, so in that case that could be true. But for simple order-n models it should be sufficient to track the last max(all models orders) bytes and then rebuild the contexts for all models on mismatch.

    As for using the context match length as mixer context, it does help, but I'm not sure if it works any better than just using an order 0 or 1 context to select the mixer weight vector, like is normally done in zpaq. It might.
    You mean the highest order index or long match length? I think highest order index (if the limit is relatively small, like single digit) should be the best single context for selecting mixer.

    Now I get it : this would avoid the problem of high-order models that have not collected any statistics yet, and are initialized with a probability of 1/2.
    Yep, mainly. Additionally, it could also have other implications - data compression is so unpredictable that you can't reliably predict when something will help or damage compression :]

    A match model could be a good solution to discard long matches.
    More precisely - it would be good for avoiding many reduntant (well, not 100% reduntant, but practically useless) computations.
    In general removing repetitions doesn't change the load on hash table as repetitions do not add any new entries to hash table - the entries are already here. On the other hand, removing repetitions could help avoid problems like overfitting, I think.

  3. #63
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    > You mean the highest order index or long match length? I think highest order index (if the limit is relatively small, like single digit) should be the best single context for selecting mixer.

    Probably so. I think that's why an ICM-ISSE chain works better than mixing different order ICMs. It effectively does the same thing as using the context order as a mixing context because at each step in the chain, an empty bit history selects a pair of weights to modify the prediction as p := w1*p + w2 in the logistic domain. Initially, w1 = 1, w2 = 0. If the proper action is to emphasize the lower order prediction, then the ICM will learn this by increasing w1.

    > But for simple order-n models it should be sufficient to track the last max(all models orders) bytes and then rebuild the contexts for all models on mismatch.

    That would probably work, but PAQ doesn't do that. There would be a speed penalty, of course.

  4. #64
    Member Jean-Marie Barone's Avatar
    Join Date
    Oct 2012
    Location
    Versailles, France
    Posts
    60
    Thanks
    15
    Thanked 4 Times in 3 Posts

    SMAC 1.16: Bitwise, Logistic Mixing, ICM, Orders-6 to 1 & Match Model

    ● A context seen for the first time points to a Bit history value of 0. With version 1.16, the couples of Frequencies linked to this Bit history entry will not be updated, so its probability will remain p(0)=1/2. Therefore, because Stretch(p(0))=Log2(0.5/0.5)=0, contexts seen for the first time will have no impact on the global prediction.

    ● The Match model is triggered on a byte boundary. It searches, on encoding, the Source buffer history (data read so far) to find the same pattern as the current Context (8 bytes). On decoding, it will search the Destination buffer (data written so far).
    The bits following that pattern will be considered as next prediction bits until a mismatch occurs, with probabilities of p(0)=65535/65536 if bit 0 is predicted, and p(0)=1/65536 if bit 1 is predicted.
    Eventually, that prediction will be included in the Logistic mixing process, along with Orders-6 to 1 predictions.
    The Match model uses a hash-table of 512 Kilobytes (65535 entries of 8 bytes) to find in the buffer a pattern matching the current Context.
    Each entry of the hash-table is designed like this :

    The 64 bits of the Context are hashed to a 16 bits value. Because collisions are very likely, the low-half of the current Context will be compared with the 2nd double-word of the Match hash-table, which contains the low-half of the last Context having produced the same hash.
    If both values differ, a collision is detected and the Match model will terminate with a Match Prediction value of 2, meaning that no prediction could be made, and a Match Offset value of 0, meaning the model is turned "off".
    If both values are equal, Match Offset value will be filled with the first double-word of the Match hash-table entry, which contains the offset in the buffer of the next byte that follows the same context. Match Prediction will be filled with the most significant bit of that byte.
    In all cases (either a prediction could be carried out or not), the entry of the Match hash-table will be updated with respectively the offset of the byte following the current Context, and its low double-word value.
    During the next bit encoding/decoding process, the procedure will know that a Match model is "on" by testing the Match Offset variable which will be different from 0, and will fetch the next bit of Match Prediction directly from that address. If that prediction turns out to be wrong, Match Offset will be set to 0, and a new prediction will be calculated on the next byte boundary.

    Results on enwik8 & enwik9 are 21831822 & 183551384.
    Attached Files Attached Files
    Last edited by Jean-Marie Barone; 31st July 2013 at 03:20. Reason: typo

  5. #65
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts

  6. The Following User Says Thank You to Matt Mahoney For This Useful Post:

    Jean-Marie Barone (31st July 2013)

  7. #66
    Member Jean-Marie Barone's Avatar
    Join Date
    Oct 2012
    Location
    Versailles, France
    Posts
    60
    Thanks
    15
    Thanked 4 Times in 3 Posts

    SMAC 1.17 : Optimized for speed

    After looking over Intel & Agner Fog’s documentation regarding Core Duo processors & above, I endeavoured to modify the code in order to give a boost to compression & decompression times.

    I have discarded many instructions greater than 1 µop or with high latencies, reorganized them to favor a 4-1-1-1 decoding scheme and, more generally, I tried to take profit from latest Intel procs novelties like Store-forwarding, loopback buffer, zero-latency movs, macro-fusions, etc…

    The FNV-1 hash for Contexts has also been overhauled, so that the Order-4 hash uses Order-3 hash as a partial result, and likewise for Order-6 hash.

    There is no improvement in the compression engine itself, still the results are a tad better due to a change in the rounding method of the Bit History counters. Also the lookup table for 232/(n0+n1) now uses floating-point values instead of integers previously.

    Results on enwik8 & enwik9 are 21816272 & 183459153.
    Attached Files Attached Files

  8. #67
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Nice improvements. http://mattmahoney.net/dc/text.html#1834

    Edit: also, some improvement in the Silesia benchmark: http://mattmahoney.net/dc/silesia.html
    Last edited by Matt Mahoney; 4th November 2013 at 17:22.

  9. The Following User Says Thank You to Matt Mahoney For This Useful Post:

    Jean-Marie Barone (5th November 2013)

  10. #68
    Member Jean-Marie Barone's Avatar
    Join Date
    Oct 2012
    Location
    Versailles, France
    Posts
    60
    Thanks
    15
    Thanked 4 Times in 3 Posts

    SMAC 1.17A : more speed optimization

    I managed to gain 20 seconds more on enwik8 with my Core 2 Duo Merom, essentially by getting rid of the slow FSCALE instruction in the Squash function, while calculating 2-p.
    I also replaced some FPU instructions with faster ones, plus a few more optimization tricks.
    Compression results remain identical as previous version.
    Attached Files Attached Files

  11. #69
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts

  12. The Following User Says Thank You to Matt Mahoney For This Useful Post:

    Jean-Marie Barone (20th November 2013)

  13. #70
    Member Jean-Marie Barone's Avatar
    Join Date
    Oct 2012
    Location
    Versailles, France
    Posts
    60
    Thanks
    15
    Thanked 4 Times in 3 Posts

    SMAC 1.18 : optimized for speed (part III, final?)

    * The Squash function has been entirely rebuilt using a chain of polynomial approximations (3rd degree, 32 intervals, step=1).
    I've traded a bit of compression for the benefit of speed : -40 seconds on enwik8 on my Core 2 Duo.

    * Until recently, I was not aware that XMM registers were considered as volatile when calling an API.
    On Windows 64 bits, it was possible (but not confirmed) that calls to ReadFile & WriteFile destroy xmm0 & xmm2, which kept the Context and a mask for a hashtable.
    Ergo, I have replaced xmm0 & xmm2 by xmm6 & xmm7, which are preserved on 64 bits.


    The Squash function now uses a SSE3 instruction (fisttp). Since SSE3 was introduced in 2004(Intel) & 2005(AMD), I've assumed that nowadays it is active on every PCs (In other words, I've been too lazy to check for SSE3 compatibility).

    The (minimax) polynomial tables for the Squash function are the work & idea of qWord : thanks to him, and all MASM32 forum members for their support!

    Results on enwik8 & enwik9 are 21816285 & 183459860.
    Attached Files Attached Files
    Last edited by Jean-Marie Barone; 9th December 2013 at 05:04. Reason: some bugs with font

  14. #71
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Last edited by Matt Mahoney; 10th December 2013 at 15:40.

  15. The Following User Says Thank You to Matt Mahoney For This Useful Post:

    Jean-Marie Barone (10th December 2013)

  16. #72
    Member Jean-Marie Barone's Avatar
    Join Date
    Oct 2012
    Location
    Versailles, France
    Posts
    60
    Thanks
    15
    Thanked 4 Times in 3 Posts

    SMAC 1.19 : yet another speed improvement

    • I have figured out that my Stretch function could be radically improved.
      Indeed, if we substitute p0=(n0/(n0+n1))*232 in Stretch(p0)=log2(p0/(232-p0)), we end up with :
      Stretch(p0)=log2(n0/n1)=log2 n0 – log2 n1
      with n0=bit 0 count, n1=bit 1 count, p0=prediction that next bit will be 0, and ni ranges between [1;65535]

      Thus, there was no need to calculate p0 & p1. Compression of enwik8 is now less than 5 min. under my Core 2 Duo.
    • The Contexts hashes are now calculated earlier in order to increase the Prefetch Scheduling Distance : prefetches instructions benefit from this additional leeway to complete successfully.


    Results on enwik8 & enwik9 are 21816323 & 183459942.
    Attached Files Attached Files

  17. The Following User Says Thank You to Jean-Marie Barone For This Useful Post:

    Nania Francesco (19th December 2013)

  18. #73
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts

  19. The Following User Says Thank You to Matt Mahoney For This Useful Post:

    Jean-Marie Barone (20th December 2013)

  20. #74
    Member Jean-Marie Barone's Avatar
    Join Date
    Oct 2012
    Location
    Versailles, France
    Posts
    60
    Thanks
    15
    Thanked 4 Times in 3 Posts

    SMAC 1.20 : using Laplace estimator

    I have switched my prediction formula p0=n0/(n0+n1) with Laplace’s rule of succession, which gives better results in estimating the probability of next symbol.
    From now on, p0=(n0+1)/(n0+n1+2),and Stretch(p0)=log2(n0+1)-log2(n1+1).
    I made some experiments with Krichevski-Trofimov estimator, where
    p0=(n0+0.5)/(n0+n1+1), and Stretch(p0)=log2(2n0+1)-log2(2n1+1), but results were not as good.
    One positive side-effect of Laplace estimator is that it solves the Zero-frequency problem : with previous formula, I had to initialize n0 and n1 counts to one to avoid a division by zero. Now, the frequencies can range from [0;65535].

    The Match model has been improved. The bit following a matching pattern of 8 bytes is now predicted with a probability of p0=1-(1/x) if bit 0 is predicted, and p0=1/x if bit 1 is predicted, with x=64 and x=x+1 after each correct prediction.

    Results on enwik8 & enwik9 are 21781544 & 183190888.
    Attached Files Attached Files

  21. #75
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Results for Silesia corpus.

    Code:
      Silesia dicke mozil   mr   nci ooff  osdb reym samba  sao webst x-ray  xml Compressor -options
    --------- ----- ----- ---- ----- ---- ----- ---- ----- ---- ----- ----- ---- -------------------
     48093438  2322 16009 2446  1787 2270  2725 1104  3596 4892  6458  4078  400 smac 1.20
     48253662  2331 16034 2453  1819 2271  2733 1111  3608 4902  6489  4093  403 smac 1.17
     48254125  2331 16035 2453  1819 2271  2733 1111  3608 4902  6489  4093  404 smac 1.19
     48254226  2331 16035 2453  1819 2271  2733 1111  3608 4902  6489  4093  404 smac 1.18
     48272013  2333 16035 2454  1823 2271  2733 1112  3609 4903  6495  4093  404 smac 1.16
    Edit: LTCB results. http://mattmahoney.net/dc/text.html#1831
    Last edited by Matt Mahoney; 17th January 2014 at 15:41.

  22. The Following User Says Thank You to Matt Mahoney For This Useful Post:

    Jean-Marie Barone (16th January 2014)

  23. #76
    Member Jean-Marie Barone's Avatar
    Join Date
    Oct 2012
    Location
    Versailles, France
    Posts
    60
    Thanks
    15
    Thanked 4 Times in 3 Posts

    SMAC 1.22 : Secondary Symbol Estimation

    ● SSE improves the prediction by collecting further statistics in a 2D-table, featuring an Order-2 context and a stretched prediction ranging from -16 to +16 (33 boundaries).
    My method roughly follows the one described by Matt in the latest PAQ archivers.
    Each entry of the table stores a pair of 16 bit counters reflecting the observed bits for a given context & prediction.
    From these counters, a new stretched prediction will be calculated.



    Like with PAQ, both counters will be halved when one of them is saturated, and initial counters are set so that each prediction maps to itself.
    The SSE prediction is calculated as a linear interpolation between above & below boundaries, and the final prediction is equal to 3/4 SSE prediction+1/4 Initial prediction.
    The closest entry to the linear interpolation will be updated when the result is known.

    ● With hindsight, it appeared to me that my Match model was overly complicated.
    Henceforth, each entry of the Match table is a simple doubleword containing the offset of a matching pattern, given current context (8 bytes).
    Both contexts (current context & Matching pattern) are still compared to detect hash collisions.
    Also, I have increased the memory allocated for the table.

    ● FPU instructions have been replaced by SIMD ones (SSE & SSE2), except in the creation of the Logarithms lookup table.

    ● Some optimization in the code.

    ● Source code has been ported to UASM.

    Results on enwik8 & enwik9 are 21047188 & 176607820.
    Attached Files Attached Files
    Last edited by Jean-Marie Barone; 3rd June 2018 at 02:06. Reason: forgot results

  24. The Following User Says Thank You to Jean-Marie Barone For This Useful Post:

    Mike (3rd June 2018)

  25. #77
    Member Jean-Marie Barone's Avatar
    Join Date
    Oct 2012
    Location
    Versailles, France
    Posts
    60
    Thanks
    15
    Thanked 4 Times in 3 Posts

    SMAC 1.23 : 3D SSE

    This version uses a 3D-table SSE which inputs:
    - a stretched prediction from -16 to +16
    - an Order-2 context
    - the current bit rank (8 segments from 0 to 7)

    Some good results can also be achieved using only 2 segments (high & low nibble), or 4 segments.
    Bit rank is a useful piece of information; including it in the Modeling improves overall prediction.

    Results on enwik8 & enwik9 are 20930754 & 175759034.
    Results on Silesia Corpus :

    Code:
    		Total		dickens	mozilla	mr	nci	ooff	osdb	reym	samba	sao	webst	x-ray	xml
    SMAC 1.21	47822648	2319	15829	2445	1787	2261	2725	1102	3543	4892	6436	4078	399
    SMAC 1.22	46544819	2271	15421	2401	1593	2233	2608	1060	3479	4849	6242	4000	382
    SMAC 1.23	45592332	2269	15002	2285	1589	2147	2467	1057	3485	4789	6234	3883	379
    Attached Files Attached Files

  26. The Following 2 Users Say Thank You to Jean-Marie Barone For This Useful Post:

    Gonzalo (13th July 2018),kaitz (12th July 2018)

  27. #78
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    467
    Thanks
    202
    Thanked 81 Times in 61 Posts
    Thanks for sharing this, Jean-Marie! I see it fits good the use case of text-alike data. It even outperforms ppmd on my test set (although at the cost of speed).

    Now, could you elaborate a little on memory requirements? I aim to try it on a few big files but I don't want to risk halting the system. Thanks in advance!

  28. #79
    Member Jean-Marie Barone's Avatar
    Join Date
    Oct 2012
    Location
    Versailles, France
    Posts
    60
    Thanks
    15
    Thanked 4 Times in 3 Posts
    It uses a lot of memory, but is still below its 2 GB limit of adress space. Still, I don't recommend using it with files greater than 1 GB as it would be to slow & impractical. You have some better alternatives if you wish to perform some serious backups.


    Edit : Also, I forgot to mention SMAC was not designed to compress/decompress files longer than 4 GB, as it uses a 32 bit variable to store the length of the file.

    Code:
    LOCAL FileLength:DWORD
    
    call	GetFileSize,@@SourceHandle,0
    	mov	FileLength,eax		;store original size
    Last edited by Jean-Marie Barone; 13th July 2018 at 21:53.

  29. #80
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    467
    Thanks
    202
    Thanked 81 Times in 61 Posts
    Thanks for the info. I was just giving it a try to know its limits, no serious backup involved. It seems the method is symmetric, so it has no practical value to me as a day-to-day program, but it's good to see new ideas put to work!

Page 3 of 3 FirstFirst 123

Similar Threads

  1. Arithmetic Coding : Underflow problem
    By Jean-Marie Barone in forum Data Compression
    Replies: 11
    Last Post: 24th October 2012, 02:53
  2. Arithmetic coding broken in IJG-code
    By thorfdbg in forum Data Compression
    Replies: 1
    Last Post: 10th September 2012, 21:22
  3. Raising performance bar in arithmetic coding
    By stbit in forum Data Compression
    Replies: 43
    Last Post: 28th April 2011, 09:07
  4. Minimal Ashford arithmetic-coder termination
    By Ethatron in forum Data Compression
    Replies: 18
    Last Post: 15th January 2011, 14:38
  5. Experiments with small dictionary coding
    By Matt Mahoney in forum Data Compression
    Replies: 13
    Last Post: 28th June 2010, 16:34

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •