Page 1 of 2 12 LastLast
Results 1 to 30 of 36

Thread: Simple binary rangecoder demo

  1. #1
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,267
    Thanks
    200
    Thanked 985 Times in 511 Posts

    Simple binary rangecoder demo

    http://nishi.dreamhosters.com/u/rc_v0.rar

    fpaq0p is based on Matt's coder and is somewhat hard to find these days anyway,
    so I decided to post a new incarnation.

    So here we have:
    fpaq0p_mm.cpp - 88 lines, 242226 on book1bwt (SClog=15), 243048 (SClog=12)
    fpaq0p_sh.cpp - 88 lines, 241455 on book1bwt (SClog=15), 242947 (SClog=12)

    For reference:
    fpaq0p.cpp - 190 lines, 244093 on book1bwt (SClog=12)
    (redundancy is due to EOF flags)

    Code:
    #include <stdio.h>
    
    enum { SCALElog=15, SCALE=1<<SCALElog, mSCALE=SCALE-1, hSCALE=SCALE/2 };
    
    template< int DECODE >
    struct Rangecoder {
      enum { NUM=4, sTOP=0x01000000U, Thres=0xFF000000U };
    
      typedef unsigned long long int qword;
      unsigned range, code, FFNum, Cache; qword lowc; FILE* f;
    
      void Init( FILE* _f ) {
        f = _f; range = 0xFFFFFFFF; lowc = 0; FFNum = 0; Cache = 0;
        if( DECODE==1 ) for( unsigned _=0; _<NUM+1; _++ ) (code<<=8)+=getc(f); 
      }
    
      void Quit( void ) { if( DECODE==0 ) for( unsigned _=0; _<NUM+1; _++ ) ShiftLow();  }
    
      unsigned Process( unsigned freq, unsigned bit ) { 
        unsigned rnew = (qword(range)*(freq<<(32-SCALElog)))>>32;
        if( DECODE ) bit = (code>=rnew);
        bit ? range-=rnew, (DECODE ? code-=rnew : lowc+=rnew) : range=rnew;
        while( range<sTOP ) range<<=8, (DECODE ? (code<<=8)+=getc(f) : ShiftLow());
        return bit;
      }
    
      unsigned ShiftLow( void ) {
        unsigned Carry = unsigned(lowc>>32), low = unsigned(lowc);
        if( low<Thres || Carry ) {
          putc( Cache+Carry, f );
          for (;FFNum != 0;FFNum--) putc( Carry-1, f );
          Cache = low>>24;
        } else FFNum++;
        return lowc = (low<<8);
      }
    };
    
    struct Predictor {
      short cxt; 
      short p[256-1]; 
    
      Predictor() { for( unsigned i=0; i<sizeof(p)/sizeof(p[0]); i++ ) p[i]=hSCALE; byte(); }
      int P() const { return p[cxt-1]; }
      unsigned byte( void ) { unsigned c=cxt; cxt=1; return c; }
    
      void update( unsigned y ) {
        if( y ) p[cxt-1] -= (         p[cxt-1]  >> 5), cxt+=cxt+1;
        else    p[cxt-1] += ((SCALE - p[cxt-1]) >> 5), cxt+=cxt+0;
      }
    };
    
    template< class Predictor, class Rangecoder > 
    void proc( Predictor& p, Rangecoder& rc, unsigned y=0 ) { p.update( rc.Process( p.P(), y ) ); }
    
    unsigned flen( FILE* f ) {
      fseek( f, 0, SEEK_END );
      unsigned len = ftell(f);
      fseek( f, 0, SEEK_SET );
      return len;
    }
    
    int main( int argc, char** argv ) {
      unsigned i,c,f_len = 0;
      if( argc<4 ) return 1;
      FILE* f = fopen( argv[2], "rb" ); if( f==0 ) return 2;
      FILE* g = fopen( argv[3], "wb" ); if( g==0 ) return 3;
      Predictor p;
      if( argv[1][0]=='c' ) {
        f_len = flen( f );
        fwrite( &f_len, 1,4, g );
        Rangecoder<0> rc; rc.Init(g);
        for( i=0; i<f_len; i++ ) {
          c = getc(f); p.byte();
          proc(p,rc,c&0x80); proc(p,rc,c&0x40); proc(p,rc,c&0x20); proc(p,rc,c&0x10);
          proc(p,rc,c&0x08); proc(p,rc,c&0x04); proc(p,rc,c&0x02); proc(p,rc,c&0x01);
        } rc.Quit();
      } else {
        fread( &f_len, 1,4, f );
        Rangecoder<1> rc; rc.Init(f);
        for( i=0; i<f_len; i++ ) {
          proc(p,rc); proc(p,rc); proc(p,rc); proc(p,rc); proc(p,rc); proc(p,rc); proc(p,rc); proc(p,rc);
          putc( p.byte(), g );
        }
      }
      fclose( f );
      fclose( g );
      return 0;
    }

  2. The Following 2 Users Say Thank You to Shelwien For This Useful Post:

    Bulat Ziganshin (12th May 2016),encode (2nd April 2016)

  3. #2
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Are you sure about that "cxt-1"?

  4. #3
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,267
    Thanks
    200
    Thanked 985 Times in 511 Posts
    it doesn't affect speed, as it doesn't matter which constant to use (p or p-1)
    and the context 1,1a,1ab,1abc,... can't be <1.

  5. #4
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    How about alignment? 255 is not as good as 256 element array.

  6. #5
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,267
    Thanks
    200
    Thanked 985 Times in 511 Posts
    There's cxt instead of p[0] - its exactly for alignment.
    Well, would be with #pragma pack(1) at least :)

  7. #6
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,267
    Thanks
    200
    Thanked 985 Times in 511 Posts
    Here's a multiplication-free version: http://nishi.dreamhosters.com/u/rc_v1.rar
    As expected, there's a compression loss (small for now): 23334805 -> 23335987 for enwik8bwt,
    but what's more interesting. its actually much slower: 3.719s -> 7.078s (enwik8bwt encoding time).
    Well, with random access to 1M tables its no wonder.
    But how far we'd have to reduce the size of lookup tables to make it really faster?..

    Code:
    enum{ T_log_size = 1<<17 };
    uint T_log[T_log_size];
    
    enum{ T_exp_log=15, T_exp_size=1<<T_exp_log };
    uint T_exp[T_exp_size];
    
    #define M_LOG2E 1.44269504088896340736
    double log2( double a ) { return M_LOG2E*log(a); }
    
    double exp2( double a ) { return exp( a/M_LOG2E ); }
    
    int tables_init( void ) {
      int i; double S;
      S = 1<<(32-5);
      for( i=0; i<T_log_size; i++ ) T_log[i] = i>0 ? S*log2(i) : 0;
      S = 1.0/(1<<(T_exp_log-5));
      for( i=0; i<T_exp_size; i++ ) T_exp[i] = exp2(S*i);
      return 0;
    }
    
    uint Rangecoder::Process( uint freq, uint bit ) { 
      uint rnew = T_exp[ (T_log[rprec]+T_log[freq])>>(32-T_exp_log) ];
      if( DECODE ) bit = (code>=rnew);
      bit ? range-=rnew, (DECODE ? code-=rnew : lowc+=rnew) : range=rnew;
      while( range<sTOP ) range<<=8, (DECODE ? (code<<=8)+=getc(f) : ShiftLow());
      rprec = range>>SCALElog;
      return bit;
    }

  8. #7
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Did you look at CTW source? There is a method to reduce those tables for addition/subtraction in logistic domain. Maybe it helps.
    BIT Archiver homepage: www.osmanturan.com

  9. #8
    Member
    Join Date
    Apr 2010
    Location
    El Salvador
    Posts
    43
    Thanks
    0
    Thanked 1 Time in 1 Post
    Barney Stratford has a nice table-based approximation:

    Paper:
    http://citeseerx.ist.psu.edu/viewdoc...10.1.1.65.7401

    C++:
    http://pyramidworkshop.svn.sourcefor...15&view=markup

    You can easily raise precision there (it's only 256 bytes ). The coder is much faster than a regular one, but suffers maybe 0.5% rounding-loss.

  10. The Following User Says Thank You to Ethatron For This Useful Post:

    Bulat Ziganshin (12th May 2016)

  11. #9
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Ethatron View Post
    Barney Stratford has a nice table-based approximation:

    Paper:
    http://citeseerx.ist.psu.edu/viewdoc...10.1.1.65.7401

    C++:
    http://pyramidworkshop.svn.sourcefor...15&view=markup

    You can easily raise precision there (it's only 256 bytes ). The coder is much faster than a regular one, but suffers maybe 0.5% rounding-loss.
    I've read it quickly. It seems this is somewhat different from "binary range coder". This article shows an approximation for eliminating division in arithmetic coding multi-alphabet symbols. We are looking for coding binary alphabets in logistic domain.

    @Shelwien: Even if it would be ended as a slower version of linear one, it would be cool to see how "logistic counter" works at modeling
    BIT Archiver homepage: www.osmanturan.com

  12. #10
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,267
    Thanks
    200
    Thanked 985 Times in 511 Posts
    Ethatron: that coder compresses enwik8bwt to 67503738, then rar compresses its output to 35430684, which means ~50% redundancy.
    What am I doing wrong?

  13. #11
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I guess just the plain number shows us where the problem is: O0 with a stationary model on plain BWT output?
    M1, CMM and other resources - http://sites.google.com/site/toffer86/ or toffer.tk

  14. #12
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,498
    Thanks
    741
    Thanked 664 Times in 358 Posts
    But how far we'd have to reduce the size of lookup tables to make it really faster?..
    use smaller table? btw, multiply is just 4 ticks, almost the same as L2 cache access. much more imprtant are (afaik) 2 divisions in decoder. i don't see any in your code, but may be they are in ShiftLow?

  15. #13
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,267
    Thanks
    200
    Thanked 985 Times in 511 Posts
    > I guess just the plain number shows us where the problem is: O0 with
    > a stationary model on plain BWT output?

    You're right, though it halves the freqs occasionally so isn't 100% stationary.
    But anyway it looks more like bitcode than actual arithmetic.
    Well, may be a good replacement for adaptive huffman at least, thanks Ethatron.

    > use smaller table?

    Sure, the question is what's the smart way to do that, with minimum compression loss.
    Actually its what various patented arithmetic coders do, but their tradeoffs require
    so much branches, that on x86 they're actually slower.
    (eg. http://www.compression.ru/sh/elscoder.rar)

    > btw, multiply is just 4 ticks, almost the same as L2 cache access.

    Unfortunately its not as straight as it seems. There's a dependency chain
    around that multiplication (shift, mul, add/sub, cmp, branch), and cpu
    gets stuck there until result is ready, so there's a noticeable difference
    even between
    Code:
     // (qword(range)*(freq<<(32-SCALElog)))>>32 -> 5.0s to encode enwik8bwt
            mov       ecx, DWORD PTR [48+esp]   
            shl       eax, 17                   
            mul       ecx                       
            cmp       edx, 16777216             
            jae       .B1.242
    and

    Code:
    // unsigned rnew = (range>>SCALElog)*freq; -> 4.2s to encode enwik8bwt
            mov       eax, DWORD PTR [560+esp]  
            mov       edx, eax                  
            shr       edx, 15                   
            imul      edi, edx                  
            cmp       edi, 16777216             
            jae       .B1.244
    > much more imprtant are (afaik) 2 divisions in decoder.

    There're none in bitwise coders.
    There's basically nothing beside multiplication and a branch,
    that's why getting rid of multiplication is so important.

  16. #14
    Member
    Join Date
    Apr 2010
    Location
    El Salvador
    Posts
    43
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by osmanturan View Post
    I've read it quickly. It seems this is somewhat different from "binary range coder". This article shows an approximation for eliminating division in arithmetic coding multi-alphabet symbols.
    It's a multiplication & division free arithmetic coder. Only adds & shifts. It's model agnostic. You can drive PPMDII with it if you plug it in. You could do the optimization for a binary only version quite easily. Here the C++-version is faster than a loop-free bsf-asm coder, so the 4-5 (mistaken) branches and the rest are quicker than mul+div+bsf+(mistaken)branch. Not as fast as a canonical static huffman though.

  17. #15
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,267
    Thanks
    200
    Thanked 985 Times in 511 Posts
    Here's a new version of static linear mix(o0,o1),
    with state machine counters (same fsm both for o0 and o1)
    http://nishi.dreamhosters.com/u/rc_v5.rar
    Results: 213230(book1bwt) 21310472(enwik8bwt)

    Also, I guess we need an index for these coders already

    bitwise o0 rangecoder demo / rc version comparison
    http://nishi.dreamhosters.com/u/rc_v0.rar

    bitwise o0 with multiplication-free rangecoder (via log/exp tables)
    http://nishi.dreamhosters.com/u/rc_v1.rar

    v2 was the version made to plot the compression(update rate) function for toffer:
    http://nishi.dreamhosters.com/u/book1bwt_wr.png
    http://nishi.dreamhosters.com/u/book1bwt_wr1.png
    http://nishi.dreamhosters.com/u/book1bwt_wr.txt

    static linear mix of o0 and o1 linear counters
    http://nishi.dreamhosters.com/u/rc_v3.rar

    linear mixing demo with BFA0-3 updates:
    http://nishi.dreamhosters.com/u/rc_v4.rar
    (see http://encode.su/threads/1159-Updati...mixer-with-BFA)

    linear mix(o0,o1) with fsm counters tuned for book1bwt:
    http://nishi.dreamhosters.com/u/rc_v5.rar
    see http://encode.su/threads/1137?p=22739#post22739
    also http://encode.su/threads/1152-State-machines

    linear mix(o0,o1) with coroutine i/o and unrolled rangecoder
    http://nishi.dreamhosters.com/u/o01_v0.rar

    interpolated SSE2(o0,o1) demo, includes
    mix_test_v2 (static/adaptive linear mixing) and
    mix_test_v3 (logistic mixing)
    http://nishi.dreamhosters.com/u/o01_v1.rar
    see http://encode.su/threads/1158-SSE2(o...-linear-inputs

    Rips of BSC's postcoders (logistic mix3 + interpolated SSE):
    http://nishi.dreamhosters.com/u/bsc240_qlfc_v0.rar (original)
    http://nishi.dreamhosters.com/u/bsc240_qlfc_v2.rar (refactored)
    see http://encode.su/threads/1150-bsc-s-qlfc-postcoder

    New vectorized rangecoding demo (batch renormalization for 8 bits at once)
    http://nishi.dreamhosters.com/u/vecrc_v0.rar
    Old proof-of-concept for the same thing:
    http://compression.ru/sh/parcoder.rar

    Demo for coding with restricted alphabet (ie only A-Z in output file)
    (via arithmetic decoding with zero probability for unwanted symbols)
    http://nishi.dreamhosters.com/u/marc_v1.rar
    Simple demo of coroutine class used in marc_v1:
    http://nishi.dreamhosters.com/u/fibo_1.rar
    Old version (18-01-2003), with a real generalized base-N rangecoder:
    http://compression.ru/sh/marcdemo.rar

  18. The Following User Says Thank You to Shelwien For This Useful Post:

    Bulat Ziganshin (12th May 2016)

  19. #16
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Any ideas on how to optimize/generate that fsm table? Brute-force is not really possible... Can you tell how you did that?

  20. #17
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    278
    Thanks
    6
    Thanked 29 Times in 18 Posts
    Quote Originally Posted by Shelwien View Post
    (qword(range)*(freq<<(32-SCALElog)))>>32 -> 5.0s to encode enwik8bwt
    unsigned rnew = (range>>SCALElog)*freq; -> 4.2s to encode enwik8bwt
    the first one gives better compression
    but why is the first one faster than ((freq*uint64_t(range)) >> SCALElog ?
    I tested it, but sticked to your variant.

    I find the name fsm-counter irritating. Every counter has finite states on limited precision.
    None the less: very good results for rc_v5
    Last edited by Sebastian; 25th November 2010 at 17:02.

  21. #18
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Look at that more carefully.

    With:

    ((freq*qword(range)) >> SCALElog

    we do 64-bit shift, which is very slow

    and here:

    (qword(range)*(freq<<(32-SCALElog)))>>32

    we just take a higher 32-bit dword from the qword, that is faster.

  22. #19
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    278
    Thanks
    6
    Thanked 29 Times in 18 Posts
    Nice did not know that compilers are that clever

  23. #20
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,267
    Thanks
    200
    Thanked 985 Times in 511 Posts
    > Any ideas on how to optimize/generate that fsm table?
    > Brute-force is not really possible...

    Its bruteforce anyway, one way or another (heuristic search
    with no clear end is still bruteforce imho).
    And counter state machine is recurrent, so any changes in
    state map may affect the whole state sequence.
    So the counter fsm optimization algorithm imho looks more
    or less like this:
    1. Generate some initial set of states
    (ie enumerate all substrings up to length 64 in dumped histories)
    2. Try merging any two states into one and test how it affects
    compression - ideally a pair which provides best compression
    after merging is chosen.
    3. Continue until you get a required number of states (ie 256)

    > Can you tell how you did that?

    I just generated a definition for my optimizer (its only 31x256 bits in the end)
    from a simple ad hoc fsm (its result on book1bwt was 219476) and it was
    running for 10 hours or so.

    Note that this fsm is somewhat overtuned though - enwik8bwt result is
    not any better than with plain linear counters.
    But its a matter of choice of optimization target in the end.

    Btw, here's a partial dump of rc_v5 fsm state-history map:
    http://nishi.dreamhosters.com/u/fsm-213203-depth13.txt
    Unfortunately it only lists 156 states out of 256, but I only
    reached 244 states by enumerating all bitstrings up to length 29, so...

    Anyway, its pretty interesting how bit counts have nearly nothing to
    do with state clustering :)

    > (qword(range)*(freq<<(32-SCALElog)))>>32 -> 5.0s to encode enwik8bwt
    > unsigned rnew = (range>>SCALElog)*freq; -> 4.2s to encode enwik8bwt
    > the first one gives better compression

    In theory, yes, but actually it depends on parameter optimization.
    When model parameters are tuned to "imprecise" range update
    (note that it cuts the 0's interval due to rounding), the "precise"
    one starts showing worse results.

    > but why is the first one faster than ((freq*uint64_t(range)) >> SCALElog ?
    > I tested it, but sticked to your variant.

    Because >>32 on x86-32 means "result is in EDX after multiplication",
    and qword>>15 requires complicated shifting with two registers and a few
    instructions - in fact dumber compilers (like VC) may even put a library call there.

    P.S.:

    I was too slow :)

    > Nice did not know that compilers are that clever

    They're not actually - only newer IntelC and gcc would notice that.

  24. #21
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,267
    Thanks
    200
    Thanked 985 Times in 511 Posts
    http://nishi.dreamhosters.com/u/rc_v1c.rar

    Continuing from http://encode.su/threads/1153-Simple...ll=1#post22888
    there's a new version of multiplication-free rc with range in log domain.
    In this one the table_size/precision tradeoff is actually configurable,
    so in conjunction with some fsm counters it might actually work as a speed
    optimization somewhere.

    This is the function that does encoding/decoding in rc_v0:
    Code:
    uint Process( uint freq, uint bit ) { 
      uint rnew = (qword(range)*(freq<<(32-SCALElog)))>>32;
      if( DECODE ) bit = (code>=rnew);
      bit ? range-=rnew, (DECODE ? code-=rnew : lowc+=rnew) : range=rnew;
      while( range<sTOP ) range<<=8, (DECODE ? (code<<=8)+=getc(f) : ShiftLow());
      return bit;
    }
    And this is the corresponding function from rc_v1c:
    Code:
    uint Process( uint freq, uint bit ) {
      uint lr0 = range + T_log[freq];
      uint lr1 = range + T_log[SCALE-freq];
      uint rnew = T_exp[lr0&T_mfr] >> (31-(lr0>>T_lfr));
      if( DECODE ) bit = (code>=rnew);
      bit ? range=lr1, (DECODE ? code-=rnew : lowc+=rnew) : range=lr0;
      enum { TOP=24<<T_lfr, INC=8<<T_lfr, DEC=SCALElog<<T_lfr };
      while( range<TOP ) range+=INC, DECODE ? (code<<=8)+=getc(f) : ShiftLow();
      range -= DEC;
      return bit;
    }
    Here, it should be obvious that the T_log table can be merged
    with counter LUT, which would significantly reduce the required memory
    (with SCALE=1<<13, T_log takes 16k).
    So with counter fsm it would make sense to use it with 5k LUTs or maybe 9k.
    But for now I only did some tests with 64k tables:

    Code:
    [Q9450 @ 3.52Ghz]
    rc_v0    3.937s   7.657s 61256047
    rc_v1c   5.516s   8.625s 61268641
    
    [iphone]
    rc_v0  258.386s 170.973s 61256047  65.6x 22.3x
    rc_v1c 453.061s 334.892s 61268641  82.6x 38.8x
    1.5x slower decoding on PC is a quirk of IntelC inlining, basically
    an artifact of getc/putc i/o, but I think it doesn't matter in this test.

    Anyway, here we can see that the compression overhead is ~0.02%, which is much better
    than results of other multiplication-free ari coders.

    And also we can see that on iphone the actual problem is read speed
    (encoding is relatively slow because it has to read a bigger file),
    and having to access 64k of LUTs is much worse than some multiplications.

  25. #22
    Member
    Join Date
    Aug 2012
    Location
    Moscow
    Posts
    4
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien View Post
    Hello,

    Since i found it several months ago and find it very useful, do you have any updated
    version of your Multi Alphabet Range Encoder? (For example with the new table you added)

    Thank you very much for your interesting work, there is lot of things to learn!
    Last edited by compressme; 31st August 2012 at 11:01.

  26. #23
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,267
    Thanks
    200
    Thanked 985 Times in 511 Posts
    > do you have any updated version of your Multi Alphabet Range Encoder?
    > (For example with the new table you added)

    There're two different methods:
    http://compression.ru/sh/marcdemo.rar (base-N rangecoder)
    and
    http://nishi.dreamhosters.com/u/marc_v1.rar
    (encoding via decoding with specific probability distribution)

    I like the latter approach more, because its simpler and faster,
    but its not compatible with some rc optimizations
    (including multiplication workarounds, if you meant that).
    The problem's that these optimizations leave unused intervals
    in code space, but this coder has to be able to "decode" any
    random file, as decoding and encoding are reversed for it.
    So various tricks, including carryless designs, can't be applied there.

    However, the first type is compatible with most tricks, if you
    need them for some reason - its just necessary to replace all
    the <<8 stuff with *N basically, so it ends up slower anyway.

    Now that I think, I have a newer implementation of that type:
    http://nishi.dreamhosters.com/u/marc_v2.rar

    Btw, I'd really appreciate another workaround (besides macros and #include)
    for gcc template blindness - gcc basically can't automatically handle nested templates

    Example:
    Code:
    template< int i1 >
    struct T1 {
      enum{ j1=i1 };
    };
    
    template< int i2 >
    struct T2 : T1<i2> {
      using T1<i2>::j1; // WTF
      enum{ j2=i2+j1 };
    };
    (Compare http://ideone.com/ccd1j and http://ideone.com/xP6MI )

    Unfortunately the "recommended" method with "using" directives is ugly
    (imagine typing in all the dependencies manually) and doesn't even always work
    ("using" isn't applicable to template methods), so I usually end up emulating
    templates with macros for gcc.
    Am I doing something wrong?..

  27. #24
    Member
    Join Date
    Aug 2012
    Location
    Moscow
    Posts
    4
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hello,

    Thank you for your quick answer and your updated implementation !
    Is there any reason that the Predictor class is not used with the Multi Alphabet Range Coder ?
    ( For example it is used by rc_v1c.rar, but not with MARCs )
    I didn't manage to put them together...

    I will do some research about the nested templates in gcc, it's a good question by the way...

    Thanks for your time !

  28. #25
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,267
    Thanks
    200
    Thanked 985 Times in 511 Posts
    > Is there any reason that the Predictor class is not used with the Multi Alphabet Range Coder ?
    > (For example it is used by rc_v1c.rar, but not with MARCs)

    Predictor class is not used in coders fully designed by me, while rc_vX series evolved
    from Mahoney's codec design.

    > I didn't manage to put them together...

    Funny thing is that there was some kind of model in this new coder initially,
    I just removed it before posting, to simplify things
    Originally the model looked kinda like this:
    Code:
    template< int mode >
    struct Model : Rangecoder<mode> {
      uint f_len;
      Counter f0[256];
    
      void do_process( void ) {
        uint b,bit,c,i,ctx;
        for( i=0; i<DIM(f0); i++ ) f0[i].P=M_f0P0;
        rc_Init();
    
        for( i=0; i<f_len; i++ ) {
          if( mode==0 ) c=get();
          for( b=7,ctx=1; b!=-1; b-- ) {
            if( mode==0 ) bit = (c>>b)&1;
            bit = rc_BProcess( f0[ctx].P, bit );
            f0[ctx].Update( bit, M_f0wr, M_f0mw );   
            ctx += ctx + bit;
          }
          c = byte(ctx);
          if( mode==1 ) put(c);
        }
    
        rc_Quit();
        yield(this,0);
      }
    
    };
    You can see that kind of model eg. in http://nishi.dreamhosters.com/u/o01_v0.rar


    > I will do some research about the nested templates in gcc, it's a good question by the way...

    Btw, VS and IntelC don't have that problem, so I just don't use gcc for development.
    Its easy enough to make sources compatible with it afterwards, but too annoying to
    maintain the workarounds during the development.

    Unfortunately gcc's behavior is caused by standard compliance -
    see http://stackoverflow.com/questions/11405/gcc-problem

  29. #26
    Member
    Join Date
    Aug 2012
    Location
    Moscow
    Posts
    4
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Ok, too bad for gcc, if this is standard....

    Thank you for the example, but i'm asking me what is the Counter class used here ? Do you have any sample ?
    (It is not present in o01_v0.rar for example)

    Quote Originally Posted by Shelwien View Post
    Funny thing is that there was some kind of model in this new coder initially,
    I just removed it before posting, to simplify things
    I understand, if you have any sample too i'm interested to do some benchmarks !

    Quote Originally Posted by Shelwien View Post
    Predictor class is not used in coders fully designed by me, while rc_vX series evolved
    from Mahoney's codec design.
    Ok !

    Thank you for your quick answers :]

  30. #27
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,267
    Thanks
    200
    Thanked 985 Times in 511 Posts
    > i'm asking me what is the Counter class used here ?
    > (It is not present in o01_v0.rar for example)

    Uh, but it _is_ there, right at the start of the .cpp file too...

    > I understand, if you have any sample too i'm interested to do some benchmarks !

    Ok, here's marc_v2 with full order0 model:
    http://nishi.dreamhosters.com/u/marc_v2a.rar

    I don't see much point in benchmarking it though - normally base64-like coding
    and compression are performed by different programs.

  31. #28
    Member
    Join Date
    Aug 2012
    Location
    Moscow
    Posts
    4
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien View Post
    Uh, but it _is_ there, right at the start of the .cpp file too...
    Oh sorry, i looked at it too quickly, i didn't think it was just next to the main function ... :s

    Just found the concept interesting, as i'm using this kind of encoding algorithms from time to time.
    I will try to understand how it works, deeper in the code

    Thank you !

  32. #29
    Member
    Join Date
    Jun 2019
    Location
    Pune
    Posts
    4
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hello Shelwien,

    I am in the learning phase of the Range encoding. After reading the range encoding i came to know that it is the technique to map the particular sequence in the decided number scale, here at the encoder side we have a code at last encoded using the probability of the symbols present in the input string. And at the decompressor side this encoded input (code) is used to generate the input sting back. But after seeing your code where "code" is updated (parameter is getting subtracted image is attached for the same), i got confused !!why do we need to update the "code" parameter when it is input to us?

    I request you to please help me in understanding this.

    Regards,
    Hariom
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	RangeEncoder.PNG 
Views:	21 
Size:	10.4 KB 
ID:	6642  

  33. #30
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,267
    Thanks
    200
    Thanked 985 Times in 511 Posts
    You can see "subbotin.cpp" - its more symmetric in that sense, since it needs "low" value to avoid carry.

    Of course you can just read compressed data to "code", then never modify it and compare intervals to "code-low" instead.
    Its just a speed optimization.

    I mean this: https://encode.su/threads/3084-Simpl...-short-strings

Page 1 of 2 12 LastLast

Similar Threads

  1. M1 - Optimized demo coder
    By toffer in forum Data Compression
    Replies: 189
    Last Post: 21st July 2010, 23:49
  2. Simple bytewise context mixing demo
    By Shelwien in forum Data Compression
    Replies: 11
    Last Post: 27th January 2010, 03:12
  3. BMF is not binary lossless NOR pictore lossy
    By SvenBent in forum Data Compression
    Replies: 4
    Last Post: 23rd August 2009, 12:54
  4. Delta: binary tables preprocessor
    By Bulat Ziganshin in forum Forum Archive
    Replies: 14
    Last Post: 1st April 2008, 09:43
  5. Bytewise vs Binary
    By Shelwien in forum Forum Archive
    Replies: 9
    Last Post: 30th March 2008, 16:51

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •