Results 1 to 13 of 13

Thread: Simple rangecoder with low redundancy on short strings

  1. #1
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,912
    Thanks
    291
    Thanked 1,272 Times in 719 Posts

    Simple rangecoder with low redundancy on short strings

    Posted these in LawCounsels thread, decided to repost since nobody would see it there.

    Here's a simple bytewise order0 coder (with rc used by PPMd):
    http://ctxmodel.net/files/PPMd/subbotin.cpp

    Here I ported my rangecoder which was specifically modified for precision (cdm etc):
    http://ctxmodel.net/files/PPMd/sh_v1m.cpp

    http://nishi.dreamhosters.com/u/rangecoder_v0a.rar (exes and test script)
    (Modified sh_v1m rc/model to encode zero-size file to zero-size file.)

  2. Thanks (6):

    Bulat Ziganshin (19th March 2019),Cyan (18th March 2019),JamesB (19th March 2019),Mike (18th March 2019),Stefan Atev (18th March 2019),xinix (18th March 2019)

  3. #2
    Member
    Join Date
    Aug 2016
    Location
    USA
    Posts
    54
    Thanks
    13
    Thanked 20 Times in 15 Posts
    Thanks a ton! Can I ask in your opinion if this coder is suitable to use with large alphabets (say thousands or tens of thousands of symbols, most with low frequency)? What if I switch it to work with 64-bit?

    Also, how do you normally prefer these things to be attributed?

  4. #3
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,912
    Thanks
    291
    Thanked 1,272 Times in 719 Posts
    > Also, how do you normally prefer these things to be attributed?

    Both are public domain.
    "subbotin.cpp" - rangecoder by Dmitry Subbotin, frontend/model by Eugene Shelwien.
    "sh_v1m.cpp" - rc/frontend/model by Eugene Shelwien.

    > Can I ask in your opinion if this coder is suitable to use with large alphabets
    > (say thousands or tens of thousands of symbols, most with low frequency)?

    Subbotin's rc supports "total" up to 2^16 (see BOT macro).
    sh_v1m up to 2^24.
    That's why counter rescale loop is commented out in sh_v1m.

    If its really necessary, I have a 128-bit rc in stegdict -
    https://github.com/Shelwien/stegdict.../sh_v1m128.inc
    it supports up to 2^56.
    Stegdict specifically uses it to encode a word choice, with support for billion-sized dictionaries.

    Its possible to support up to 2^31 total on 32-bit system using old-style arithmetic coder with bitwise output,
    but I don't think I have an optimized version of that.
    As an example, I have "beeari" here: http://compression.ru/sh/coders6a.rar

    > What if I switch it to work with 64-bit?

    Its possible, but not especially easy.
    For example, I had to write this: https://github.com/Shelwien/stegdict...muldiv_x86.inc
    to extend sh_v1m to 64-bit range.

  5. Thanks:

    Bulat Ziganshin (19th March 2019)

  6. #4
    Member
    Join Date
    Aug 2016
    Location
    USA
    Posts
    54
    Thanks
    13
    Thanked 20 Times in 15 Posts
    Thanks - I was just trying to figure out if large alphabets with very skewed distributions would suffer from too much re-normalization - I have no experience/intuition about range coders, only a vague theoretical worry...; It's funny that compilers have added __umul128 intrinsic but nobody's exposing the 128-by-64 divide that's kind of related... The stegdict may be more suitable for me, I am basically experimenting with a specific offline dictionary builder so my symbols will be basically "words".

  7. #5
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,912
    Thanks
    291
    Thanked 1,272 Times in 719 Posts
    > I was just trying to figure out if large alphabets with very
    > skewed distributions would suffer from too much re-normalization -

    Yes. As I said, a normal rangecoder would not be able to encode
    a probability less than 1/(1<<24) (or even 1/(1<<16) for faster ones).

    But if we'd renormalize probabilities so that 1/(1<<24) would turn into 1/(1<<16),
    we'd also have a 16-bit codelength for this word instead of 24-bit,
    which is obviously redundant.

    > It's funny that compilers have added __umul128 intrinsic but
    > nobody's exposing the 128-by-64 divide that's kind of related...

    They added __int128 type, but it only exists in x64 mode.
    And I didn't see a 128/64 divide for x86 at all, so I had to write one.
    On x64 its not a problem, because DIV cpu instruction just normally does it.

    > The stegdict may be more suitable for me, I am basically experimenting with a
    > specific offline dictionary builder so my symbols will be basically "words".

    There might be still an error in flush handling in stegdict rc.
    It can be easily avoided by simply calling ShiftLow 8 times in rc_Quit,
    but atm there's this complex code that attempts to cut off tail bytes.

    So if you'd ever get a repeatable test case where decoding fails at the end,
    just give it to me and I'd fix that.

  8. #6
    Member
    Join Date
    Aug 2016
    Location
    USA
    Posts
    54
    Thanks
    13
    Thanked 20 Times in 15 Posts
    They added __int128 type, but it only exists in x64 mode.
    And I didn't see a 128/64 divide for x86 at all, so I had to write one.
    On x64 its not a problem, because DIV cpu instruction just normally does it.
    Yes, I meant that there is no intrinsic to access the RDX:RAX div through C/C++, you have to use inline assembly...; just an annoyance that in one case you don't need inline assembly, and in another one you do. I have stopped worrying about 32-bit platforms in my experiments and professionally long time ago, though I get it that not everybody has that luxury...

    Cheers!

  9. #7
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    504
    Thanks
    184
    Thanked 177 Times in 120 Posts
    I've been exploring some of your coders (coders6c.rar maybe? I forget the source) for the CRAM file format. One thing I explored is how well they work in other languages, specifically something simplistic like JavaScript.

    It turns out the 32-bit Schindler variant works brilliantly in Javascript, while the others are around 100x slower due to not natively supporting 64-bit arithmetic. In C, the Schindler code is maybe 5% slower or so, so I concluded it's an acceptable tradeoff.

  10. #8
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,564
    Thanks
    775
    Thanked 687 Times in 372 Posts
    Actually, MSVC added __umul128 intrinsic, while gcc added __int128 type

  11. #9
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,912
    Thanks
    291
    Thanked 1,272 Times in 719 Posts
    > (coders6c.rar maybe? I forget the source)

    http://compression.ru/sh/aridemo6.rar
    http://compression.ru/sh/coders6a.rar
    http://compression.ru/sh/coders6b.rar
    http://compression.ru/sh/coders6c2.rar

    Another noteworthy thing there is the "CLD-a" coder.
    It uses FPU registers to implement 64-bit integer arithmetics.
    Thing is, long double/extended type has 64 bits of mantissa, so its possible.
    The trick is to always add 1<<63 to any values to keep fractional bits away.
    (Otherwise there could be differences on different platforms/compilers - different rounding etc).

    > It turns out the 32-bit Schindler variant works brilliantly in Javascript, while the others are around 100x slower due to not natively supporting 64-bit arithmetic.

    Actually it would be possible to modify any coders with 64-bit "low" the same way:
      union {
    struct {
    uint low;
    uint Carry;
    };
    qword lowc;
    };
    ...
    lowc += tmp;


    to

       Carry += (low+tmp<low);
    low += tmp;


    As to muldiv functions, unless there's a 64/32 division, its easier to do range/=total first.
    Although for binary coding there're options without 64-bit arithmetics and precision loss:
        uint rnew = (range>>SCALElog)*freq;
    uint rnew = (qword(range)*(freq<<(32-SCALElog)))>>32; //
    uint rnew = ( range>=16*sTOP ) ? freq*(range>>SCALElog) : freq*(range>>(SCALElog-4))>>4;


    > Actually, MSVC added __umul128 intrinsic, while gcc added __int128 type

    In any case, the only real problem is 128/64 division using 32-bit arithmetics.
    Of course, a version with shift loop and subtractions would be simple, but also slow.
    But using 64/32 divisions to implement 128/64 is tricky.

  12. #10
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,912
    Thanks
    291
    Thanked 1,272 Times in 719 Posts
    http://nishi.dreamhosters.com/u/rangecoder_v0b.rar

    Added another rc (fp_rc.cpp) with range defined as double.
    Supports freq-total up to 40 bits.

    This code:
    static const uint NUM     = 6;
    static const uint BITS = NUM*8;
    static const double sTOP = 0x01ULL<<(BITS-8);
    static const qword ThresL = 0xFFULL<<(BITS-8);
    static const qword ThresH = 0x01ULL<<(BITS-0);
    ...
    qword low;
    double code;
    double range;


    defines the number of compressed data bytes in "low" register. So 6*8=48.
    Min possible range value is "sTOP", so 1<<40 - it also defines the max total.
    double type has 52 mantissa bits, so we can't use more than that here,
    and non-byte-aligned low size requires additional data caching and realignment,
    so its better to use the next byte-aligned value.

    When SSE math is used, this coder seems to be compatible with different compilers.

    I also tried converting "low" to double, but ended up reverting it back,
    since it always has to be converted to int anyway (to extract compressed bytes),
    but there's no benefit in using float-point type for it.

    Well, CLD coder kept everything in FP, because it allowed to maintain rc state
    without ever flushing to memory, when the main algorithm is all scalar integer.
    But SSE2+ supports both integer and FP types, so there's kinda no point
    if it adds extra work.

  13. Thanks:

    Mike (22nd March 2019)

  14. #11
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,912
    Thanks
    291
    Thanked 1,272 Times in 719 Posts
    http://nishi.dreamhosters.com/u/rangecoder_v0c.rar
    Code:
    768,771 book1    log2(maxtotal)
    435,595 fp32_rc   16
    435,398 fp64_rc   40
    435,398 sh_v1m    24
    435,402 sh_v2f    31
    435,908 subbotin  16
    Added rc_v2f - arithmetic coder with bitwise i/o and freqs up to 1<<31.
    Accidentally found it in one of CM coders on ctxmodel.net.
    Output has +4 bytes comparing to v1m/fp_rc, because there's no tail cutting - precision is actually similar.

    Also added a float-based rc (fp32_rc; double-based one renamed to fp64_rc).
    It needs freq renorm like "subbotin", since it also supports total only up to 1<<16,
    but 32-bit registers might be good for vectorization. I think I'd try it later.
    fp64 also can be vectorized, but obviously there'd be 2x less instances.

  15. #12
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,912
    Thanks
    291
    Thanked 1,272 Times in 719 Posts
    http://nishi.dreamhosters.com/u/rangecoder_v0d.rar
    Code:
    full    resc64k name     freqbits tailcut
    ------- 435,901 subbotin  16      +
    ------- 435,595 fp32_rc   16      +
    435,398 435,581 fp64_rc   40      +
    435,398 435,581 sh_v1m    24      +
    435,398 435,580 sh_v2f    31      +  // 435580 is random luck - last byte was accidentally FF
    435,398 435,581 sh_128    56      +
    
    input size:   0 1 2 3 4
    fp32_rc       0 2 3 4 5 // len+1 is because of EOF, symbol probability is <1/256, thus extra byte is needed for byte alignment
    fp64_rc       0 2 3 4 5 // only with 8+ of same symbol it starts actually compressing files
    sh_128        0 2 3 4 5
    sh_v1m        0 2 3 4 5
    sh_v2f        0 2 3 4 5
    subbotin      2 2 3 4 5 // 0->2 is because of 2^32-1 range at init; other coders use 2^n
    - Added sh_128 from stegdict
    - Added tail cutting in subbotin and sh_v2f
    - Added test with freq rescaling for all coders
    - Added output size test for small files
    - Split main.inc from coders, since its always the same

  16. #13
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,912
    Thanks
    291
    Thanked 1,272 Times in 719 Posts
    http://nishi.dreamhosters.com/u/rangecoder_v0e.rar
    Code:
    book1   resc64k  wcc386  resc64k  name     freqbits
    ------- 435,901  ------- 430,266  subbotin  16
    ------- 435,595  ------- 430,070  fp32_rc   16
    435,398 435,581  439,267 430,035  fp64_rc   40 // wcc386 resc64k results are smaller because data is non-stationary
    435,398 435,581  439,267 430,036  sh_v1m    24
    435,398 435,580  439,266 430,036  sh_v2f    31
    435,398 435,581  439,266*430,036* sh_128    56 // wcc386 not decodable; probably a math bug
    435,395 435,575  439,263 430,033  sh_v1n    24
    
    input size:   0 1 2 3 4 6 7 8 9 A B C  D
    subbotin      2 2 3 4 5 6 6 7 7 8 9 9 10
    fp32_rc       0 2 3 4 5 5 6 7 7 8 9 9 10
    fp64_rc       0 2 3 4 5 5 6 7 8 8 9 9 10
    sh_v1m        0 2 3 4 5 5 6 7 8 8 8 9 10
    sh_v2f        0 2 3 4 5 6 5 7 7 8 9 9 10
    sh_128        0 2 3 4 5 5 6 7 7 8 9 9  9
    sh_v1n        0 1 2 3 4 5 5 6 7 7 8 8  9
    - Found a bug in sh_128 - decoding error on wcc386; not fixed yet

    - Added sh_v1n = sh_v1m without explicit EOF coding.
    Instead, it generates code value at flush in a way to cause decoding uncertainty,
    which is used to detect EOF:
          code = rc.rc_GetFreq(total);
    for( x=0,low=0; low+freq[x]<=code; c++ ) low+=freq[x];
    if( (rc.EOFn>0) && (rc.rc_GetFreq1(total)<low) ) break; // EOF!
    rc.rc_Process(low,freq[x],total);

    Unfortunately access to freqtable at EOF point is necessary for rc flush.
    For coders like ppmd this can be tricky (though not impossible).


    http://nishi.dreamhosters.com/u/freqtable_v0.rar
    A different frontend for sh_v2f, with combinatoric model and extra freqtable file.
    main() can be extracted and combined with most coders here (except sh_v1n).

    For normal files this approach is actually better, since there's no handling of new symbols, or EOF,
    and its easy enough to compress the freqtable with a custom model.

    But for short strings its not really a good idea, since its hard to compress a 1024-byte freqtable to under 1 bit.

Similar Threads

  1. Achieving decent compression of individual short strings?
    By VioletGiraffe in forum Data Compression
    Replies: 17
    Last Post: 26th September 2019, 19:01
  2. Simple binary rangecoder demo
    By Shelwien in forum Data Compression
    Replies: 35
    Last Post: 17th June 2019, 16:21
  3. simple low order encoders (java)
    By Lucas in forum Data Compression
    Replies: 8
    Last Post: 30th August 2015, 10:11
  4. Simple Variable Order Rangecoder
    By david_werecat in forum Data Compression
    Replies: 62
    Last Post: 4th January 2012, 17:59
  5. Most efficient/practical compression method for short strings?
    By never frog in forum Data Compression
    Replies: 6
    Last Post: 1st September 2009, 04:05

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •