Hi! Here's the proposed paper about PAQ:
sites.google.com/site/toffer86/130111-preprint-no-doi.pdf
I've partly finished incorporating all changes. If you find any quirks, let me know.
Hi! Here's the proposed paper about PAQ:
sites.google.com/site/toffer86/130111-preprint-no-doi.pdf
I've partly finished incorporating all changes. If you find any quirks, let me know.
Last edited by toffer; 13th January 2012 at 18:59. Reason: link updates
M1, CMM and other resources - http://sites.google.com/site/toffer86/ or toffer.tk
nice reading, thank you... it comes in handy as a refernce for an university project
1. perhaps you should write at formula 25 formula 3 again to avoid confusion...because I was confused
2. In my opinion you don't have to include the sum of weights in the denominator as this complicates math. You just assume that the sum is one and modify the update-rule accordingly.
3. Maybe you should state in 2) that your expected code-len is the cross-entropy.
Last edited by Sebastian; 12th January 2012 at 00:03.
Thanks for the interesting paper. I didn't realize that logistic mixing and geometric mixing are equivalent.
One minor point is that in PAQ7/8 the weights are constrained to the range 0 to 1 as in your analysis, but they were not constrained to add to 1. In ZPAQ I found that increasing the bounds on the weights to -8 to 8 improved compression. For example, a MIX2 (weights in [0,1] and sum to 1) usually gives worse compression than a 2 input MIX (weights range from -8 to 8, need not sum to 1). The only reason to have a MIX2 is that it is faster and uses less memory.
In PAQ, weights are 16 bits. This made it easy to implement a mixer in MMX/SSE2. In ZPAQ, weights are 20 bits. I implemented predict() in SSE2, but update() is faster in generic x86. I couldn't scale the weights to 16 bits without losing too much compression.
If you'd assume hardware-FP16 support, would you think it works better?
You mean the "half" data type? (EXR uses it to represent pixels. sign + 5 bit exponent + 10 bit mantissa). The reason I need 20 bits for mixer weights is to reduce cumulative roundoff errors when the weights are updated. When mixing, you could throw away the low 8 bits without any effect on compression. But if you did that you would have to update the weights probabilistically, which I don't think would be faster.
No, I mean storing "probabilities" as FP16. The log-scale should give it an edge I think.
Maybe. In ZPAQ, probabilities are stored in the logistic domain as ln(p/(1-p)) as 12 bit signed numbers with 6 bits after the decimal point. All computation is done as simple integer arithmetic. FP16 should have enough range and precision. I haven't looked at performance. Hardware conversion between integer and FP16 could possibly be faster than lookup tables to convert between 12 bit logistic and 15 bit linear I do now. I assume this is on a GPU?
Bulldozer comes with native FP16 format conversions. Cycles are in line with integer operation, so you wouldn't need to go int because of the speed. I'm just thinking if we possibly reach now the threshold to be able to use effectively low precision FP. I was doing sufficiently fast FP16 conversions in the past on vectors (as regular int-fp operations mix on SSE2), which is of course another thing. With cvt16 we may have acceptable scalar conversion.
The problem with integer probabilities is always you don't get into the interesting regions with very low probability (0.000001 fe.). The rounding loss for arithmetic coding can't be considered that dramatic, but I imagine for mixing it could be interesting to have stored better pronounced low probabilities.
Last edited by Ethatron; 18th January 2012 at 11:15.
Matt needs high precision near 0 and 1. His lookups provides that (I think), while floats give high precision only near 0. Workarounds for that could nullify any performance advantages.
You'd used signed floats for that kind of precision.
What do you mean? If we are dealing with probabilities in range [0, 1] then one can map [0.5, 1] to [-0.5, 0] by subtracting 1 (ie this is the workaround I've mentioned in earlier post), so that numbers near zero would represent either very high or very low probabilites. But then dealing with such representation means lots of ifs.
I would rather try to improve logarithm estimation, ie squash and stretch functions.
There are no "ifs" when you deal with SSE-code. Consider it a "no problem".
I was asking if it helps his precision not if the implementation is convenient and fits into a C-style model.
Why not try it for yourself