hi.
maybe i will find some spare time to work on tarsalzp someday. currently i'm thinking on techniques to be used in it.
my current scheme is:
1. mixed order-4 and order- 8 lzp model. it outputs one predicted symbol and codes a flag indicating whether that symbol occured or outputs information that context confimation failed and doesn't code any flag.
2. if symbol wasn't coded by previous step then code it using aridemo like order- 2 coder.
i will work on step 1. myself as almost nobody uses rank0 (or lzp in my terminology) codecs. maybe except toffer who uses it in cmm, but cmm is a cm which is very different.
i plan to improve step 2. i want to implement aridemo like order- 3 coder and fpaq02f like order- 1 coder both with symbol masking (excluding). what i don't know is a escape modelling and handling of 'pollution'.
what's 'pollution'? typical example is pdf's. it contains a lot of text and a lot of random (compressed) data. that compressed data pollutes statistics gathered on text additionally compressed data gets expanded by large amount. what i want to achieve is to make some workaround for it.
mu current workaround is delayed addition of symbols. symbols are added on second occurence. that improves performance on pdf's a lot (it's usually better that ppmii on such type of data) but hurts on many other types of more or less textual data.
if nobody knows a efficient solution for 'pollution' problem then maybe i will skip it as it's not very important.
most important thing for me is efficient escape modelling.