There's a new paper preprint (accepted by the Data Compression Conference 2013) which deals with an analysis of (worst-case) code length bounds for linear and geometric mixtures (a generalization of PAQ mixing) coupled with online gradient descent for weight estimation available at my site. To my knowledge this is the first theoretical analysis of PAQ. I hope someone will find it useful.

Have a look here: https://sites.google.com/site/toffer86/documents.

Cheers
toffer