Code:
- MatchModel: removed SmallStationaryContextMap, slightly improved the StateMap-based contexts
- Set MatchModel CM scale to 64 for a small gain
* Improvements in IndirectModel; introducing LargeIndirectContext
- Added 1 (binary) context to SparseBitModel
- Increased (doubled) memory use of SparseModel and IndirectModel
- NormalModel now includes the BlockType in its contexts
* Introducing probabilistic hash replacement strategy for ContextMap, ContextMap2 and LargeStationaryMap
- Other small fixes and cosmetic changes
Two highlights:
* Improvements in IndirectModel; introducing LargeIndirectContext
What happened to SparseModel in v197 the same happened to IndirectModel in this version - complete review / rewrite. Noticeable gain for most files. Also noticeable speed loss due to the new contexts added.
* Introducing probabilistic hash replacement strategy for ContextMap, ContextMap2 and LargeStationaryMap
The problem with the hash replacement strategy up until now was that it had the tendency to keep "large count" context statistics for long time. For semi-stationary files it was a good strategy as contexts in such files are usually long-term. But for mixed-content files (for example those having multiple block types) this led to memory-pressure as statistics for obsolete contexts would not be evicted from the hash table and new, low-count contexts would be overwritten instead. The current change addresses this issue. As a result expect a gain for larger mixed-content files (such as silesia/mozilla), small-gain / no gain / small loss for semi-stationary files (such as silesia/dickens and enwik*). The solution may be further tuned, but I think for now it is somewhat "balanced" between these two type of files. As a rare sufferer, Silesia/webster (being a large semi-stationary file) loses some KBs.
As this new strategy also performs move-to-front it has a small speed-benefit: frequently accessed hash items are found in the front of a hash bucket and so finding them needs less iterations. This speed-gain cannot completely cancel out the speed loss from the IndirectModel improvements, so v199 is still slower than v198.
paq8px_v196 -8 mozilla: 6955820
paq8px_v197 -8 mozilla: 6919615 (-36K) <- rewritten SparseModel
paq8px_v198 -8 mozilla: 6915818 (-3K
paq8px_v199 -8 mozilla: 6879971 (-35K) <- rewritten IndirectModel + probabilistic hash replacement
Multimedia files are not (really) affected (images, audio, jpg).