1. ## Sequence of bits

Hello folks,

While I was spending my holyday on a campsite in France I was experimenting with bit sequences to try to improve the prediction ratio's within my (ehm...) ppm+cm compression algorithm.

As I expected it is very rewarding to dynamicaly adjust the sequence of bits for prediction. The amount of wrong quesses is decreased around 5%, so I was very happy.

But when I came home the compression (not the prediction) became worse.

First I try to explain what I did:
My algorithm is a sort of byte-wise and I compress the predicted byte bit by bit. In the old version I started with bit 7 and counted down to 0. In the new version I take the best predictable bit and then again the next best predictable. This process is reversable, so the file can be decompressed. In my opinion a "best predictable bit" is a score = |0.5 - p| and the score must be as big as posible.

Some results on world95.txt:
Old version, unoptimised
Code:
```output: 490b, 47,88, 47,88% compression - 2917kb left - n: 120 w: 1150
output: 940b, 45,94, 45,94% compression - 2916kb left - n: 126 w: 2290
output: 1350b, 43,96, 43,96% compression - 2915kb left - n: 127 w: 3268
output: 1781b, 43,5, 43,5% compression - 2914kb left - n: 128 w: 4320
output: 2219b, 43,35, 43,35% compression - 2913kb left - n: 130 w: 5426
output: 2598b, 42,29, 42,29% compression - 2912kb left - n: 130 w: 6350
output: 2964b, 41,36, 41,36% compression - 2911kb left - n: 133 w: 7241
output: 3332b, 40,67, 40,67% compression - 2910kb left - n: 133 w: 8113
output: 3660b, 39,72, 39,72% compression - 2909kb left - n: 133 w: 8904
output: 4010b, 39,16, 39,16% compression - 2908kb left - n: 135 w: 9729```
New version, optimised
Code:
```output: 494b, 48,27, 48,27% compression - 2917kb left - n: 121 w: 1034
output: 941b, 45,95, 45,95% compression - 2916kb left - n: 125 w: 2020
output: 1351b, 43,99, 43,99% compression - 2915kb left - n: 128 w: 2912
output: 1780b, 43,48, 43,48% compression - 2914kb left - n: 130 w: 3848
output: 2218b, 43,33, 43,33% compression - 2913kb left - n: 135 w: 4818
output: 2600b, 42,33, 42,33% compression - 2912kb left - n: 135 w: 5634
output: 2978b, 41,56, 41,56% compression - 2911kb left - n: 135 w: 6415
output: 3355b, 40,97, 40,97% compression - 2910kb left - n: 135 w: 7215
output: 3682b, 39,96, 39,96% compression - 2909kb left - n: 135 w: 7869
output: 4035b, 39,41, 39,41% compression - 2908kb left - n: 137 w: 8580```
('n' stands for neutral (p = 0.5) and 'w' stands for wrong (error > 0.5). Compression ratios shoud be the other way around, but I personaly favor this way.)

Why oh why doesn't the extra 1000 bits of correct predictions deliver me a better ratio? That's the question!  Reply With Quote

2. You tuned the wrong metric - the correct metric would be lowest overall entropy:

sum k=1..K -log (a_k + b_k p_k)

a_k = 1 - y_k
b_k = 2y_k - 1

where

k - current coding step
y_k - actual encountered bit
p_k - estimation for P(y_k=1)  Reply With Quote

3. I think this might help you. But then again it might not. I have in the past played with pure arithmetic bit compressors where you work on a bit at a time and just plain 256 pure symbol compressors. You would think that doing it a bit at time may give better compression since you get 8 trys at getting the compression correct.
At first I believed the bit at a time was better. But then I noticed that the bit at a time was actually depending on how close the patterns in the symbols are to each other.
In both cases I used pure arithmetic. Which means the actually symbol order is not important. If you for example took the BWTS or UNBWTS or any other permutation the files compress to the same size. Those sizes are different if you use the bit at a time or the byte at a time.
This is not the case when people use most arithmetic compressors since the internal state carried is usually quite small and rounding affects start to show up.

when you took the pure 256 symbol arithmetic compressor if the original file was the same except you used different symbols the file would still compress to same length.

However when you used the one based on a bit at a time you get different compressed length when different sets of symbol used. Then I noticed it was possible to weigh the bit symbols by position in the byte so you could get the same results as if you used the pure 256 symbol compressor.

I like even though slower doing it a bit at a time. My current version of arb255 seems better for most files than a pure 256 symbol pure arithmetic compressor. However when I first started testing it against other so called pure arithmetic compressor it seemed to do worse.

That is until I realized a pure arithemtic compressor has to compress all permutations the same size. And when I find a pure arithemtic that tests better for a given file. I then check to see how well it compresses the BWTS or UNBWTS and then mine bets it.  Reply With Quote

4. Originally Posted by toffer You tuned the wrong metric - the correct metric would be lowest overall entropy:

sum k=1..K -log (a_k + b_k p_k)

a_k = 1 - y_k
b_k = 2y_k - 1

where

k - current coding step
y_k - actual encountered bit
p_k - estimation for P(y_k=1)
Thanks Toffer. The only problem is that y_k isn't available at forehand. As I understand now better predictions does not nessasarily give a lower entropy?
I will make a cheating version with your algoritm to compare the entropy-optimised version with the version I have made. Maybe the optimal bit sequence of the past is useful for the future.  Reply With Quote

5. Of course during actual processing the bit is unknown. But for tuning it's no problem.

Maybe you should clarify what you preciesly do.

Better predictions give better compression, since log(.) is a strictly monotonous function. If your metric only takes |0.5-p_k| into account it actually doesn't measure anything compression related - but the "skew" of p_k. Calculating the entropy requires to know the actual symbol, since you need to know its code length (which is -log( prob_of_symbol )).

Of course there's correlation between processed data and new data. Otherwise modeling wouldn't work. Reusing that is essential.

EDIT: If you want to evaluate a predictions quality you need to compare the estimation with the actual outcome - that is the error in your logic.  Reply With Quote

6. My actual learning model is not the same thing as my very simple 'bit-sequence-model'. The actual model is finetuned on optimising entropy and should not be confused with the 'bit-sequence'-experiment.

The 'bit-sequence'-experiment only chooses one of the 8 bits of the byte that needs to be predicted. (8 bits because my bytewise model delivers this information)
For example:
Bit 0: p = 0.5;
Bit 1: p = 0.4;
...
Bit 7: p = 0.9;
I first choose bit 7 for actual compression since I know for 90% sure that it will be correct and after that the other bits are recalculated.

After recalculation 7 bits are left:
Bit 0: p = 0.25;
Bit 1: p = 0.45;
...
Bit 6: p = 0.5;

The next most likely bit is bit 0 because I know for 75% for sure that bit 0 will be 0 and the process goes on until all bits are predicted.

It feels that it should decrease the overall entropy instead of increasing it?  Reply With Quote

7. The quality of that process heavily depends on the model confidence. A model estimation doesn't mean that it's correct. And again remember that "EDIT" i've written in the last post.

Afaiu your method tries to dynamically construct a symbol decomposition.

I'd suggest to change the metric to entropy and estimate each bits entropy to select the decomposition. Along you still need to estimate the probabilities.

How do you preform probability estimation?  Reply With Quote

8. Its the worst case if you determine the bit order adaptively...
byte statistics would become a complete mess.
But its not that trivial even if you'd try to use a static order
different from 7-to-0 - its only possible to directly compute
the best order if the model is very simple.
Usually the bruteforce search is the only real solution.
But then, it is actually possible to acquire better compression
with a different alphabet decomposition...
One example of that is
http://ctxmodel.net/files/BWT_reorder_v2.rar  Reply With Quote

9. Originally Posted by toffer How do you preform probability estimation?
A context mixer taking in account the following properties:
5 bits for upto order 32 matches
3 bits for 8 diverend levels of p for the sum of every match in a particular order
3 bits for the position of a match. Closer is mostly better, but sometimes it isn't. (works very well!)

11 bits and behind that a tree structure blending all these contexts to the actual used prediction.  Reply With Quote

10. Originally Posted by Shelwien Its the worst case if you determine the bit order adaptively...
byte statistics would become a complete mess.
But its not that trivial even if you'd try to use a static order
different from 7-to-0 - its only possible to directly compute
the best order if the model is very simple.
Usually the bruteforce search is the only real solution.
But then, it is actually possible to acquire better compression
with a different alphabet decomposition...
One example of that is
http://ctxmodel.net/files/BWT_reorder_v2.rar
You're saying that my prediction model gets screwed by the adaptive bit sequence? That actualy makes sense.  Reply With Quote

11. So you actually want to do such a construction at every byte coding step - or use a decomposition after determinition for a data set?

You just described some contexts used in estimation for (?) a secondary model? How do your counters look alike?

EDIT:

To clarify i think such a construction might actually help. As long as the whole process isn't recurrent: probability estimation influences decomposition AND decomposition influences estimation.

And you have to take a whole symbol into account. Not just pick a certain bit position only taking the next steps outcome into account, but construct the whole decomposition regarding the whole outcome. That could be described in analogy of greedy vs optimal parsing. But i cannot imagine of an easy (and efficient) way to solve that problem. Only reordering bit position limits the decomposition to some msb<->lsb shuffeling.  Reply With Quote

12. In theory, the order in which the bits are coded should not matter. If a and b are 2 bits, then P(ab) = P(a)P(b|a) = P(b)P(a|b). Of course in practice it does matter depending on the details of how you compute the probabilities. You can do experiments in which you reorder the bits in each byte before compression and test with lots of different compressors.  Reply With Quote

13. After reconfiguring my model compression is as good as normal. Not better, but somewhat the same. Shelwien and Matt seems to be both right in their statements.  Reply With Quote

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•