Hello folks,
While I was spending my holyday on a campsite in France I was experimenting with bit sequences to try to improve the prediction ratio's within my (ehm...) ppm+cm compression algorithm.
As I expected it is very rewarding to dynamicaly adjust the sequence of bits for prediction. The amount of wrong quesses is decreased around 5%, so I was very happy.
But when I came home the compression (not the prediction) became worse.
First I try to explain what I did:
My algorithm is a sort of byte-wise and I compress the predicted byte bit by bit. In the old version I started with bit 7 and counted down to 0. In the new version I take the best predictable bit and then again the next best predictable. This process is reversable, so the file can be decompressed. In my opinion a "best predictable bit" is a score = |0.5 - p| and the score must be as big as posible.
Some results on world95.txt:
Old version, unoptimised
Code:
output: 490b, 47,88, 47,88% compression - 2917kb left - n: 120 w: 1150
output: 940b, 45,94, 45,94% compression - 2916kb left - n: 126 w: 2290
output: 1350b, 43,96, 43,96% compression - 2915kb left - n: 127 w: 3268
output: 1781b, 43,5, 43,5% compression - 2914kb left - n: 128 w: 4320
output: 2219b, 43,35, 43,35% compression - 2913kb left - n: 130 w: 5426
output: 2598b, 42,29, 42,29% compression - 2912kb left - n: 130 w: 6350
output: 2964b, 41,36, 41,36% compression - 2911kb left - n: 133 w: 7241
output: 3332b, 40,67, 40,67% compression - 2910kb left - n: 133 w: 8113
output: 3660b, 39,72, 39,72% compression - 2909kb left - n: 133 w: 8904
output: 4010b, 39,16, 39,16% compression - 2908kb left - n: 135 w: 9729
New version, optimised
Code:
output: 494b, 48,27, 48,27% compression - 2917kb left - n: 121 w: 1034
output: 941b, 45,95, 45,95% compression - 2916kb left - n: 125 w: 2020
output: 1351b, 43,99, 43,99% compression - 2915kb left - n: 128 w: 2912
output: 1780b, 43,48, 43,48% compression - 2914kb left - n: 130 w: 3848
output: 2218b, 43,33, 43,33% compression - 2913kb left - n: 135 w: 4818
output: 2600b, 42,33, 42,33% compression - 2912kb left - n: 135 w: 5634
output: 2978b, 41,56, 41,56% compression - 2911kb left - n: 135 w: 6415
output: 3355b, 40,97, 40,97% compression - 2910kb left - n: 135 w: 7215
output: 3682b, 39,96, 39,96% compression - 2909kb left - n: 135 w: 7869
output: 4035b, 39,41, 39,41% compression - 2908kb left - n: 137 w: 8580
('n' stands for neutral (p = 0.5) and 'w' stands for wrong (error > 0.5). Compression ratios shoud be the other way around, but I personaly favor this way.)
Why oh why doesn't the extra 1000 bits of correct predictions deliver me a better ratio? That's the question!