Page 6 of 7 FirstFirst ... 4567 LastLast
Results 151 to 180 of 191

Thread: Is this how to practically implement Arithmetic Coding?

  1. #151
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,835
    Thanks
    288
    Thanked 1,239 Times in 695 Posts
    Because deterministic contexts (ones where same symbol always appears) are common in texts and deserve a higher weight than default one.

    In theory it would be the best to use 256 different coefs depending on the number of unique symbols,
    but optimizing these coefs takes time, and its not how a proper weight function is made anyway.

  2. #152
    Member
    Join Date
    Jan 2020
    Location
    Canada
    Posts
    142
    Thanks
    12
    Thanked 2 Times in 2 Posts
    But "deterministic contexts" is the offset...high freq/count means much much higher weight than really is. The first 2 coefs I thought are unique symbol total...

    weight for a model is based on:
    total counts
    total symbols
    high/low totals get exponentially more/less

  3. #153
    Member
    Join Date
    Jan 2020
    Location
    Canada
    Posts
    142
    Thanks
    12
    Thanked 2 Times in 2 Posts
    ok the layer-dedicated adjustable roof is working....

    ....note while i hand tuned the many parems for 2 hours, they are not optimally found

    20,000 enwiki8 bytes input:
    green: 6,849
    my last: 6,829
    my new: 6,811

    100,000 enwiki8 bytes input:
    Green: 31,518
    my last: 31,363
    my new: 31,266

  4. #154
    Member
    Join Date
    Jan 2020
    Location
    Canada
    Posts
    142
    Thanks
    12
    Thanked 2 Times in 2 Posts
    200,000 bytes
    mine > 60,057
    green > 60,321

  5. #155
    Member
    Join Date
    Mar 2020
    Location
    earth
    Posts
    27
    Thanks
    7
    Thanked 0 Times in 0 Posts
    After compressing some file have exceed file size greater than master files ? can you /anyone explain about this ?

  6. #156
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,835
    Thanks
    288
    Thanked 1,239 Times in 695 Posts
    If some files are compressed, some other files have to be expanded.
    Simple compression algorithms tend to add more redundancy in these cases.
    Good compressors also tend to switch to storing a file when they notice that its incompressible.

  7. Thanks:

    uhost (17th March 2020)

  8. #157
    Member
    Join Date
    Mar 2020
    Location
    earth
    Posts
    27
    Thanks
    7
    Thanked 0 Times in 0 Posts
    which is best lossless compression algorithm ? how much compress mp4 files or any compressed data ?
    ​can u tell ?

  9. #158
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,835
    Thanks
    288
    Thanked 1,239 Times in 695 Posts
    > which is best lossless compression algorithm ?

    CM (context mixing). Specific implementation - http://www.byronknoll.com/cmix.html

    > how much compress mp4 files

    .mp4 files usually contain h264 video / aac audio / mp4 index, maybe subtitles, maybe multiple audio tracks.
    So its actually a composite task, results depend on specific encoding method.
    1-5% gain should be practically possible, up to 15-20% in theory, but somebody has to write a proper h264 recompressor for that.

    https://encode.su/threads/2629-Compr...ll=1#post50652

    > or any compressed data ?

    That really depends on specific compression algorithm.
    https://encode.su/threads/2742-Compr...ll=1#post52493

  10. Thanks:

    uhost (17th March 2020)

  11. #159
    Member
    Join Date
    Jan 2020
    Location
    Canada
    Posts
    142
    Thanks
    12
    Thanked 2 Times in 2 Posts
    Added the total count squash...

    20,000 enwiki8 bytes in:
    green: 6,849
    my last: 6,829
    my last: 6,811
    my new: 6,793

    100,000
    Green: 31,518
    my last: 31,363
    my last: 31,266
    my new: 31,150

    200,000
    green: 60,321
    my last: 60,057
    my new: 59,932

  12. #160
    Member
    Join Date
    Jan 2020
    Location
    Canada
    Posts
    142
    Thanks
    12
    Thanked 2 Times in 2 Posts
    6,791 now for 20,000
    59,915 for 200,000

    now for next steps

  13. #161
    Member
    Join Date
    Jan 2020
    Location
    Canada
    Posts
    142
    Thanks
    12
    Thanked 2 Times in 2 Posts
    Added a global weight for orders.

    20,000
    6,737

    200,000
    59,457

  14. #162
    Member
    Join Date
    Jan 2020
    Location
    Canada
    Posts
    142
    Thanks
    12
    Thanked 2 Times in 2 Posts
    Green is however still faster and compresses slightly better and uses less RAM. Later I will fix the RAM like you did and the speed as well. The compression, well I came pretty close for now.

  15. #163
    Member
    Join Date
    Mar 2020
    Location
    earth
    Posts
    27
    Thanks
    7
    Thanked 0 Times in 0 Posts
    which stage is taken for compression, is that binary ? hex ? decimal ? or string ?
    can you explain
    Last edited by uhost; 21st March 2020 at 18:12.

  16. #164
    Member
    Join Date
    Jan 2020
    Location
    Canada
    Posts
    142
    Thanks
    12
    Thanked 2 Times in 2 Posts
    The neural predictor's predictions have probabilities, and the one you want to pick decides the next part of the decimal string to make during Arithmetic Coding. After the decimal string is made once arrives at the end of the file, it now transforms the decimal string into a bin file (binary) by changing it to a "number" in binary form. It's like Huffman Coding but does a near-perfect job of what Huffman Coding tries to do.



    My AC wasn't faulty in the end, it was just an input embedding issue, no real issue.

    For the first 100,000 bytes of enwiki8
    Green gets it to 29,108 bytes
    Mine gets it to 28,572 bytes

    For the first 1,000,000 bytes of enwiki8
    Green gets it to 253,800 bytes
    Mine gets it to 251,235 bytes

    For the first 10,000,000 bytes of enwiki8
    Green gets it to 2,331,508 bytes
    Mine gets it to 2,319,337 bytes

    Note my parameters can still be tweaked to get that lower a bit.

  17. Thanks:

    uhost (22nd March 2020)

  18. #165
    Member
    Join Date
    Jan 2020
    Location
    Canada
    Posts
    142
    Thanks
    12
    Thanked 2 Times in 2 Posts
    Question. After Arithmetic Coding is finished the complete file, you end up between the final high and low bounds. You can shave off a few bits at the end. Ex. 0.451-0.372 becomes 0.38.

    But how many bits can this shaving the end save you really? ~128 bits at most, no?

    ​Is coding it worth it?

  19. #166
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,835
    Thanks
    288
    Thanked 1,239 Times in 695 Posts
    > But how many bits can this shaving the end save you really? ~128 bits at most, no?

    Well, if you're using a rangecoder with 128 bit precision, then it could be 128 bits I guess.

    Common rc overhead is like this:
    - pad at start to avoid extra condition on carry flush: 1 byte
    - decoder lookahead bytes flushed to avoid tail checks: 2-4 bytes depending on rc renorm threshold
    - EOF/filesize coding: 3-4 bytes

    But its also possible to find some use for the value in the final interval -
    filesize (can be used instead of EOF) or crc % range maybe.

    Its also possible to save any (variable) number of tail bytes,
    when its likely for the model to output all 0s or all 1s at the end
    (which happens for some data types, eg. executables tend to have long zero
    padding at the end).

    > Is coding it worth it?

    For slow coders (like paq or cmix or nncp) it makes sense to have entropy coder
    with lowest possible redundancy - its kinda dumb when you save 5 bytes
    by adding a whole new submodel, while your rangecoder keeps encoding
    a megabyte of zeroes to 400 bytes because of low precision:
    https://encode.su/threads/2515-mod_p...ll=1#post51672

    But for speed-optimized coders it may be better to keep the overhead,
    because extra checks are not worth it.

    > After Arithmetic Coding is finished the complete file,
    > you end up between the final high and low bounds.
    > You can shave off a few bits at the end. Ex. 0.451-0.372 becomes 0.38.

    The volume of encoded information is usually not bit-aligned,
    so its usually impossible to completely avoid any redundancy.

    I have some low-overhead rangecoders here: https://encode.su/threads/3084-Simpl...ll=1#post59643

  20. #167
    Member
    Join Date
    Jan 2020
    Location
    Canada
    Posts
    142
    Thanks
    12
    Thanked 2 Times in 2 Posts
    That doesn't answer my question.

    Arithmetic Coding produces a very long decimal number:
    0.348385634759797898797218161555153534553462563707 8345232143287399295393590997145

    The tail end of this number can be cut back/ shaved off, because it is unneeded:
    0.348385634759797898797218161555153534553462563707 8345232143287399295393

    But it only improves compression by a extremely small amount. Programming this may cost more than it saves!

    Is it worth it? How much does chopping the tail save? And how much does coding it cost?

    The answer I expect from you is either "No, it'd be stupid to implement it" or "yes, it actually it actually improves compression because..."
    Last edited by Self_Recursive_Data; 30th March 2020 at 20:35.

  21. #168
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,835
    Thanks
    288
    Thanked 1,239 Times in 695 Posts
    Please read my previous post.

    In short, yes I implemented various versions of AC tail cutting, it saves up to 8 or so bytes per file.
    And no, unless you work with small random-access blocks, its usually not worth the trouble.

  22. Thanks:

    Self_Recursive_Data (30th March 2020)

  23. #169
    Member
    Join Date
    Jan 2020
    Location
    Canada
    Posts
    142
    Thanks
    12
    Thanked 2 Times in 2 Posts
    If we add SEE to Green, and Green already compresses enwiki8 to 21.8MB, how much would it improve this? 19.6MB?

    Now, say we add SSE, too. How much would it improve this "19.6MB"? 18MB?



    I have another question about SSE. It updates a table based on prediction error, to refine the prediction, basically. But isn't this the same thing as Local Text for giving more weight to Models that have been predicting great recently? I.e. adaptive Global Model weight favoring models based on error. Or, is SSE similar but meant for specific context?

  24. #170
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,835
    Thanks
    288
    Thanked 1,239 Times in 695 Posts
    > If we add SEE to Green, and Green already compresses enwiki8 to 21.8MB,

    How much is "21.8MB" in bytes? 22,858,956?

    > how much would it improve this? 19.6MB?

    SEE only makes sense in PPM - it implies a completely different weighting method.

    Code:
    http://mattmahoney.net/dc/text.html#1839
    ppmd J          -m256 -o10 -r1  21,388,296  // Uses SEE
    ppmd_sh9        16 1863         20,784,856
    ppmonstr J      -m1863 -o16     19,040,451  // Uses SSE
    > Now, say we add SSE, too. How much would it improve this "19.6MB"? 18MB?

    Maybe, depends on specific implementation.
    SSE is not a fixed function - its any statistical model that uses
    primary prediction as context.

    > But isn't this the same thing as Local Text for giving more weight
    > to Models that have been predicting great recently?
    > I.e. adaptive Global Model weight favoring models based on error.
    > Or, is SSE similar but meant for specific context?

    Normally SSE is orthogonal to weights and primary contexts.
    It takes already mixed primary predictions and refines them
    using statistics in context of these predictions (and possibly some other).

  25. #171
    Member
    Join Date
    Jan 2020
    Location
    Canada
    Posts
    142
    Thanks
    12
    Thanked 2 Times in 2 Posts
    > How much is "21.8MB" in bytes? 22,858,956?

    Green VS enwiki8. If I remember correctly, Green achieved approx. 21,800,000 bytes. And I'm wondering if you can add *both* SSE & SEE in the same algorithm?

    Using the 21.8MB mark, how much would SEE bring it down by? 1MB improvement? And what about SSE? 0.34MB improvement? This question will let me know how useful SSE vs SEE is.

  26. #172
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,835
    Thanks
    288
    Thanked 1,239 Times in 695 Posts
    > Green VS enwiki8. If I remember correctly, Green achieved approx. 21,800,000 bytes.

    Well, existing implementations of bytewise CM/PPM with SSE (ppmonstr,durilca,ash)
    seem to reach 18,0xx,xxx only with preprocessing (DRT or similar) -
    their results are around 19,0xx,xxx normally.

    > And I'm wondering if you can add *both* SSE & SEE in the same algorithm?

    Yes, its exactly what ppmonstr/durilca do.

    But you don't need to encode escapes in a CM, green doesn't keep even escape statistics.

    > Using the 21.8MB mark, how much would SEE bring it down by? 1MB improvement?

    You'd need a PPM for that, but yes.

    > And what about SSE? 0.34MB improvement? This question will let me know how useful SSE vs SEE is.

    I repeat, there's no one specific SSE implementation.
    PAQ has a bitwise SSE, while in theory its also possible to precompute
    the whole byte probability distribution, and with it implement unary SSE
    like in ppmonstr.

    SSE is just a secondary model. Its unknown what kind of model it would be,
    so its hard to predict its performance.
    Wrong SSE implementation can make compression worse quite easily too.

  27. #173
    Member
    Join Date
    Jan 2020
    Location
    Canada
    Posts
    142
    Thanks
    12
    Thanked 2 Times in 2 Posts
    So SSE, how does it work... if I have 17 models to mix, do I use a short model ex. order3 and use it as a goat for the rest? How does this improve compression. Can you explain it clearly? Why "secondary"....where do I pull it from, when, and what do I merge it to... I'm lost...can you elaborate enough context to walk me through this?

    You can barely find it on the internet....can you write it yourself below, so we can spread the instructions on the internet? Just rewrite it below clearly. We seem to lack new full explanations.....
    Last edited by Self_Recursive_Data; 31st March 2020 at 17:16.

  28. #174
    Member
    Join Date
    Jan 2020
    Location
    Canada
    Posts
    142
    Thanks
    12
    Thanked 2 Times in 2 Posts
    Is SSE (or is it SEE) just about updating the table error after prediction? Isn't that backpropagation?

  29. #175
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,835
    Thanks
    288
    Thanked 1,239 Times in 695 Posts
    https://encode.su/threads/541-Simple...ll=1#post64454

    > So SSE, how does it work... if I have 17 models to mix,
    > do I use a short model ex. order3 and use it as a goat for the rest?

    No. You mix predictions from these 17 models, then pass them to SSE,
    then use SSE prediction for coding.

    Its also possible to have multiple SSE-like stages (paq does).

    > How does this improve compression. Can you explain it clearly?

    1) By sharing statistics for new contexts - a prediction of context with just {'a':1}
    can be improved a lot with additional statistics for all such cases.
    2) By compensating for some persistent prediction errors

    > Why "secondary"....

    Because its a second model which uses prediction of the first model as another context.

    > where do I pull it from, when, and what do I merge it to...

    There're multiple open-source implementations at this point.
    http://mattmahoney.net/dc/dce.html#Section_433

    > I'm lost...can you elaborate enough context to walk me through this?

    Its just a secondary statistical model, same as first one,
    but able to use primary prediction as an extra context.

    Now I added a ppmonstr-like SSE implementation to green.
    But its possible to improve its compression by adding more contexts.

    > Is SSE (or is it SEE) just about updating the table error after prediction? Isn't that backpropagation?

    It would be backpropagation if you'd try to modify weights to get a specific SSE output.
    Which is not really helpful for a CM (I tried), so its not what normally happens.

    Yes, its "about updating the table after prediction".
    Same as normal contextual statistics.

  30. #176
    Member
    Join Date
    Jan 2020
    Location
    Canada
    Posts
    142
    Thanks
    12
    Thanked 2 Times in 2 Posts
    So my understanding is:

    SEE was "originally" made for 1) escape probabilities and for 2) finding shared context i.e. abc & wxy both see A follow, so you get extra statistics for free by using wxy.

    SSE was made for using the prediction as input to get another prediction.

    If that's the case, I get it all then. But I do wonder about SSE still. What is the theory/reason it works is unclear. For normal primary stats the stats themselves are the % of time to give small codes correctly, I don't see how you can improve the unimprovable stats, they speak themselves...20% time c is saw, 40% b....so b get 40% smaller code given...
    Last edited by Self_Recursive_Data; 1st April 2020 at 16:29.

  31. #177
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,835
    Thanks
    288
    Thanked 1,239 Times in 695 Posts
    > SEE was "originally" made for
    > 1) escape probabilities and for

    Yes, since escape is frequently encoded multiple times per symbol in PPM,
    so its handling is more important than any other symbol...
    and there's no perfect mathematical model for it.

    Also SEE/SSE makes it possible to use extra contexts, unrelated to
    normal prefix order-N contexts.
    For example, ppmd uses escape flag from previous symbol as SEE context.

    > SSE was made for using the prediction as input to get another prediction.

    Well, ppmd uses a linear scan to find a symbol anyway,
    so its obvious that its possible to encode "escape from symbol" flags
    for each symbol, rather than once per context.
    Then we can apply SEE model to that.

    But secondary estimation idea is helpful in any contexts really.
    Just that its impractical to feed it more than 2 probabilities at once,
    so some kind of binary decomposition has to be implemented
    to use SSE with non-binary statistics.

    > But I do wonder about SSE still.
    > What is the theory/reason it works is unclear.

    There're two main reasons:
    1) Persistent skews in primary estimation
    (n0/(n0+n1) isn't good for approximating binomial distribution)
    2) Adding extra contexts to prediction.
    Its also possible to take predictions from two submodels at once
    and feed them to 2d SSE, thus making a binary mixer equivalent.

    > I don't see how you can improve the unimprovable stats, they speak themselves...
    > 20% time c is saw, 40% b....so b get 40% smaller code given...

    That's not true, count 0 doesn't mean that a symbol won't appear,
    and 100% c doesn't mean that next symbol would be also c.
    There's a difference between prior and posterior probability estimations.

  32. #178
    Member
    Join Date
    Jan 2020
    Location
    Canada
    Posts
    142
    Thanks
    12
    Thanked 2 Times in 2 Posts
    > That's not true, count 0 doesn't mean

    Oh ya, if the tree doesn't see the whole file first, then it has uncertainty.

    But the way secondary modelling gets extra stats is still unclear to me. We take a set of primary prediction a, b, c: 33%, 10%, 57%, as context, and gives us a new prediction: _?_ What does it predict? What is the observation? The predicted symbol that is revealed once transmitted?

  33. #179
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,835
    Thanks
    288
    Thanked 1,239 Times in 695 Posts
    > Oh ya, if the tree doesn't see the whole file first, then it has uncertainty.

    There's some uncertainty even if you have symbol counts for the whole file.
    Real files are not stationary, there're always some local deviations.

    > But the way secondary modelling gets extra stats is still unclear to me.
    > We take a set of primary prediction a, b, c: 33%, 10%, 57%, as context,
    > and gives us a new prediction: _?_ What does it predict?

    There's no way to implement secondary estimation with context
    of the whole byte distribution - preserving just symbol ranks
    requires log2(256!)=1684 bits of information.

    So SSE is always applied after binary decomposition
    Code:
    a: 33 -> 1
    b: 10 -> 01
    c: 57 -> 001
             ^^^-- p=1, no need to actually encode
             \\--- p=10/(10+57)=0.149
              \--- p=33/(33+10+57)=0.33
    So we'd look up SSE['a'][0.33] and see secondary statistics for this case,
    then encode the (symbol=='a') flag using SSE prediction,
    then update SSE['a'][0.33] stats with actual flag value.

    > What is the observation? The predicted symbol that is revealed once transmitted?

    Yes, but a bit/flag rather than whole symbol.

  34. #180
    Member
    Join Date
    Jan 2020
    Location
    Canada
    Posts
    142
    Thanks
    12
    Thanked 2 Times in 2 Posts
    So you look out for a secret code flag in the Arithmetic Code bin file ex.

    1010000101111010 [ 101000 ] 000

    ?

Page 6 of 7 FirstFirst ... 4567 LastLast

Similar Threads

  1. SMAC : SMall Arithmetic Coding
    By Jean-Marie Barone in forum Data Compression
    Replies: 79
    Last Post: 14th July 2018, 19:56
  2. Replies: 2
    Last Post: 23rd November 2017, 16:44
  3. Unrolling arithmetic coding.
    By JamesB in forum Data Compression
    Replies: 23
    Last Post: 12th February 2014, 19:48
  4. jpegrescan: now with arithmetic coding!
    By Mangix in forum Data Compression
    Replies: 8
    Last Post: 5th July 2013, 22:01
  5. Arithmetic Coding : Underflow problem
    By Jean-Marie Barone in forum Data Compression
    Replies: 11
    Last Post: 24th October 2012, 02:53

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •