Results 1 to 13 of 13

Thread: Sequence of bits

  1. #1
    Member
    Join Date
    Apr 2009
    Location
    The Netherlands
    Posts
    49
    Thanks
    0
    Thanked 3 Times in 2 Posts

    Sequence of bits

    Hello folks,

    While I was spending my holyday on a campsite in France I was experimenting with bit sequences to try to improve the prediction ratio's within my (ehm...) ppm+cm compression algorithm.

    As I expected it is very rewarding to dynamicaly adjust the sequence of bits for prediction. The amount of wrong quesses is decreased around 5%, so I was very happy.

    But when I came home the compression (not the prediction) became worse.

    First I try to explain what I did:
    My algorithm is a sort of byte-wise and I compress the predicted byte bit by bit. In the old version I started with bit 7 and counted down to 0. In the new version I take the best predictable bit and then again the next best predictable. This process is reversable, so the file can be decompressed. In my opinion a "best predictable bit" is a score = |0.5 - p| and the score must be as big as posible.

    Some results on world95.txt:
    Old version, unoptimised
    Code:
    output: 490b, 47,88, 47,88% compression - 2917kb left - n: 120 w: 1150
    output: 940b, 45,94, 45,94% compression - 2916kb left - n: 126 w: 2290
    output: 1350b, 43,96, 43,96% compression - 2915kb left - n: 127 w: 3268
    output: 1781b, 43,5, 43,5% compression - 2914kb left - n: 128 w: 4320
    output: 2219b, 43,35, 43,35% compression - 2913kb left - n: 130 w: 5426
    output: 2598b, 42,29, 42,29% compression - 2912kb left - n: 130 w: 6350
    output: 2964b, 41,36, 41,36% compression - 2911kb left - n: 133 w: 7241
    output: 3332b, 40,67, 40,67% compression - 2910kb left - n: 133 w: 8113
    output: 3660b, 39,72, 39,72% compression - 2909kb left - n: 133 w: 8904
    output: 4010b, 39,16, 39,16% compression - 2908kb left - n: 135 w: 9729
    New version, optimised
    Code:
    output: 494b, 48,27, 48,27% compression - 2917kb left - n: 121 w: 1034
    output: 941b, 45,95, 45,95% compression - 2916kb left - n: 125 w: 2020
    output: 1351b, 43,99, 43,99% compression - 2915kb left - n: 128 w: 2912
    output: 1780b, 43,48, 43,48% compression - 2914kb left - n: 130 w: 3848
    output: 2218b, 43,33, 43,33% compression - 2913kb left - n: 135 w: 4818
    output: 2600b, 42,33, 42,33% compression - 2912kb left - n: 135 w: 5634
    output: 2978b, 41,56, 41,56% compression - 2911kb left - n: 135 w: 6415
    output: 3355b, 40,97, 40,97% compression - 2910kb left - n: 135 w: 7215
    output: 3682b, 39,96, 39,96% compression - 2909kb left - n: 135 w: 7869
    output: 4035b, 39,41, 39,41% compression - 2908kb left - n: 137 w: 8580
    ('n' stands for neutral (p = 0.5) and 'w' stands for wrong (error > 0.5). Compression ratios shoud be the other way around, but I personaly favor this way.)


    Why oh why doesn't the extra 1000 bits of correct predictions deliver me a better ratio? That's the question!

  2. #2
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    You tuned the wrong metric - the correct metric would be lowest overall entropy:

    sum k=1..K -log (a_k + b_k p_k)

    a_k = 1 - y_k
    b_k = 2y_k - 1

    where

    k - current coding step
    y_k - actual encountered bit
    p_k - estimation for P(y_k=1)
    Last edited by toffer; 22nd September 2009 at 18:33.

  3. #3
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    I think this might help you. But then again it might not. I have in the past played with pure arithmetic bit compressors where you work on a bit at a time and just plain 256 pure symbol compressors. You would think that doing it a bit at time may give better compression since you get 8 trys at getting the compression correct.
    At first I believed the bit at a time was better. But then I noticed that the bit at a time was actually depending on how close the patterns in the symbols are to each other.
    In both cases I used pure arithmetic. Which means the actually symbol order is not important. If you for example took the BWTS or UNBWTS or any other permutation the files compress to the same size. Those sizes are different if you use the bit at a time or the byte at a time.
    This is not the case when people use most arithmetic compressors since the internal state carried is usually quite small and rounding affects start to show up.

    when you took the pure 256 symbol arithmetic compressor if the original file was the same except you used different symbols the file would still compress to same length.

    However when you used the one based on a bit at a time you get different compressed length when different sets of symbol used. Then I noticed it was possible to weigh the bit symbols by position in the byte so you could get the same results as if you used the pure 256 symbol compressor.

    I like even though slower doing it a bit at a time. My current version of arb255 seems better for most files than a pure 256 symbol pure arithmetic compressor. However when I first started testing it against other so called pure arithmetic compressor it seemed to do worse.

    That is until I realized a pure arithemtic compressor has to compress all permutations the same size. And when I find a pure arithemtic that tests better for a given file. I then check to see how well it compresses the BWTS or UNBWTS and then mine bets it.

  4. #4
    Member
    Join Date
    Apr 2009
    Location
    The Netherlands
    Posts
    49
    Thanks
    0
    Thanked 3 Times in 2 Posts
    Quote Originally Posted by toffer View Post
    You tuned the wrong metric - the correct metric would be lowest overall entropy:

    sum k=1..K -log (a_k + b_k p_k)

    a_k = 1 - y_k
    b_k = 2y_k - 1

    where

    k - current coding step
    y_k - actual encountered bit
    p_k - estimation for P(y_k=1)
    Thanks Toffer. The only problem is that y_k isn't available at forehand. As I understand now better predictions does not nessasarily give a lower entropy?
    I will make a cheating version with your algoritm to compare the entropy-optimised version with the version I have made. Maybe the optimal bit sequence of the past is useful for the future.

  5. #5
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Of course during actual processing the bit is unknown. But for tuning it's no problem.

    Maybe you should clarify what you preciesly do.

    Better predictions give better compression, since log(.) is a strictly monotonous function. If your metric only takes |0.5-p_k| into account it actually doesn't measure anything compression related - but the "skew" of p_k. Calculating the entropy requires to know the actual symbol, since you need to know its code length (which is -log( prob_of_symbol )).

    Of course there's correlation between processed data and new data. Otherwise modeling wouldn't work. Reusing that is essential.

    EDIT: If you want to evaluate a predictions quality you need to compare the estimation with the actual outcome - that is the error in your logic.
    Last edited by toffer; 22nd September 2009 at 19:23.

  6. #6
    Member
    Join Date
    Apr 2009
    Location
    The Netherlands
    Posts
    49
    Thanks
    0
    Thanked 3 Times in 2 Posts
    My actual learning model is not the same thing as my very simple 'bit-sequence-model'. The actual model is finetuned on optimising entropy and should not be confused with the 'bit-sequence'-experiment.

    The 'bit-sequence'-experiment only chooses one of the 8 bits of the byte that needs to be predicted. (8 bits because my bytewise model delivers this information)
    For example:
    Bit 0: p = 0.5;
    Bit 1: p = 0.4;
    ...
    Bit 7: p = 0.9;
    I first choose bit 7 for actual compression since I know for 90% sure that it will be correct and after that the other bits are recalculated.

    After recalculation 7 bits are left:
    Bit 0: p = 0.25;
    Bit 1: p = 0.45;
    ...
    Bit 6: p = 0.5;

    The next most likely bit is bit 0 because I know for 75% for sure that bit 0 will be 0 and the process goes on until all bits are predicted.

    It feels that it should decrease the overall entropy instead of increasing it?

  7. #7
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    The quality of that process heavily depends on the model confidence. A model estimation doesn't mean that it's correct. And again remember that "EDIT" i've written in the last post.

    Afaiu your method tries to dynamically construct a symbol decomposition.

    I'd suggest to change the metric to entropy and estimate each bits entropy to select the decomposition. Along you still need to estimate the probabilities.

    How do you preform probability estimation?

  8. #8
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,334
    Thanks
    209
    Thanked 1,008 Times in 533 Posts
    Its the worst case if you determine the bit order adaptively...
    byte statistics would become a complete mess.
    But its not that trivial even if you'd try to use a static order
    different from 7-to-0 - its only possible to directly compute
    the best order if the model is very simple.
    Usually the bruteforce search is the only real solution.
    But then, it is actually possible to acquire better compression
    with a different alphabet decomposition...
    One example of that is
    http://ctxmodel.net/files/BWT_reorder_v2.rar

  9. #9
    Member
    Join Date
    Apr 2009
    Location
    The Netherlands
    Posts
    49
    Thanks
    0
    Thanked 3 Times in 2 Posts
    Quote Originally Posted by toffer View Post
    How do you preform probability estimation?
    A context mixer taking in account the following properties:
    5 bits for upto order 32 matches
    3 bits for 8 diverend levels of p for the sum of every match in a particular order
    3 bits for the position of a match. Closer is mostly better, but sometimes it isn't. (works very well!)

    11 bits and behind that a tree structure blending all these contexts to the actual used prediction.

  10. #10
    Member
    Join Date
    Apr 2009
    Location
    The Netherlands
    Posts
    49
    Thanks
    0
    Thanked 3 Times in 2 Posts
    Quote Originally Posted by Shelwien View Post
    Its the worst case if you determine the bit order adaptively...
    byte statistics would become a complete mess.
    But its not that trivial even if you'd try to use a static order
    different from 7-to-0 - its only possible to directly compute
    the best order if the model is very simple.
    Usually the bruteforce search is the only real solution.
    But then, it is actually possible to acquire better compression
    with a different alphabet decomposition...
    One example of that is
    http://ctxmodel.net/files/BWT_reorder_v2.rar
    You're saying that my prediction model gets screwed by the adaptive bit sequence? That actualy makes sense.

  11. #11
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    So you actually want to do such a construction at every byte coding step - or use a decomposition after determinition for a data set?

    You just described some contexts used in estimation for (?) a secondary model? How do your counters look alike?

    EDIT:

    To clarify i think such a construction might actually help. As long as the whole process isn't recurrent: probability estimation influences decomposition AND decomposition influences estimation.

    And you have to take a whole symbol into account. Not just pick a certain bit position only taking the next steps outcome into account, but construct the whole decomposition regarding the whole outcome. That could be described in analogy of greedy vs optimal parsing. But i cannot imagine of an easy (and efficient) way to solve that problem. Only reordering bit position limits the decomposition to some msb<->lsb shuffeling.
    Last edited by toffer; 22nd September 2009 at 22:15.

  12. #12
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    In theory, the order in which the bits are coded should not matter. If a and b are 2 bits, then P(ab) = P(a)P(b|a) = P(b)P(a|b). Of course in practice it does matter depending on the details of how you compute the probabilities. You can do experiments in which you reorder the bits in each byte before compression and test with lots of different compressors.

  13. #13
    Member
    Join Date
    Apr 2009
    Location
    The Netherlands
    Posts
    49
    Thanks
    0
    Thanked 3 Times in 2 Posts
    After reconfiguring my model compression is as good as normal. Not better, but somewhat the same. Shelwien and Matt seems to be both right in their statements.

Similar Threads

  1. found 6 bits redundancy in sharnd_challenge.dat
    By ddfox in forum The Off-Topic Lounge
    Replies: 1
    Last Post: 4th June 2010, 23:30
  2. Frequent Bits Elimination Method - suggest please
    By Scientist in forum Data Compression
    Replies: 14
    Last Post: 28th September 2009, 19:30
  3. LZP flag sequence compression
    By Shelwien in forum Data Compression
    Replies: 8
    Last Post: 9th August 2009, 03:08
  4. Compressing small bits of data
    By fredrp in forum Data Compression
    Replies: 9
    Last Post: 28th April 2009, 23:27

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •