Results 1 to 13 of 13

Thread: nanozip decoder source

  1. #1
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,982
    Thanks
    298
    Thanked 1,309 Times in 745 Posts

    nanozip decoder source

    Same origin, I'm just posting.
    Alpha version, mostly supports -cO and -cc single-threaded decoding, problems with some audio blocks.

    Please help with -cc decrypting, it seems related to lpaq.
    Attached Files Attached Files

  2. Thanks (3):

    birdie (30th January 2020),Simorq (26th January 2020),xinix (26th January 2020)

  3. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,982
    Thanks
    298
    Thanked 1,309 Times in 745 Posts
    - replaced coroutine class with a getc/putc stub to simplify things;
    - turned all functions to class methods, removed t-> prefixes;
    - split functions and structures to multiple include files;
    - replaced most of dynamic allocation with static;
    - merged most init functions;

    Code:
    book1 768771->202310 c.time=1.031s d.time=1.031s
    Attached Files Attached Files

  4. Thanks (2):

    Simorq (26th January 2020),xinix (26th January 2020)

  5. #3
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,982
    Thanks
    298
    Thanked 1,309 Times in 745 Posts
    Decrypted an interesting counter.
    It looks like this in the code:
    //  kDivideLookup[0] = 255;
    // for( i=1; i<255; i++ ) kDivideLookup[i] = 1024 / (3 + i * 2);
    // kDivideLookup[255] = 1;
    uint C_Update1( uint x, uint bit, uint lim ) {
    byte tt = byte(x);
    uint z = (bit<<23) - (x>>9);
    uint y = kDivideLookup[tt] * z;
    return (y & 0xFFFFFF00) + x + (tt<lim);
    }


    but essentially works kinda like this:
    union C1 {
    uint X;
    struct {
    uint N :8;
    uint P :32-8;
    };
    };

    uint C_Update1( uint _x, uint bit, uint lim ) {
    C1 x; x.X=_x;
    if( bit==0 ) {
    x.P -= 2*x.P/(x.N*2+3);
    } else {
    x.P += 2*((1<<24)-x.P)/(x.N*2+3);
    }
    x.N += (x.N<lim);
    return x.X;
    }


    This version also improves book1 compression from 203310 to 202412.

    In other words, its something like P = (1<<24)*(N1+0.75)/(N0+0.75+N1+0.75)

    The interesting part is the "vector" update: x + (tt<lim) + (y & 0xFFFFFF00)
    updates both occurrence counter and the probability parts without shifts.

    Update: Sorry, it was actually 202310 in original version. I guess limited precision division via multiplication by LUT
    has the effect of slightly slowing down the update rate for higher N.
    I tried and got 202300 like this:
      if( bit==0 ) {
    x.P -= 4*x.P/(x.N*5+5);
    } else {
    x.P += 4*((1<<24)-x.P)/(x.N*5+5);
    }

  6. Thanks (3):

    Mike (26th January 2020),Noragami (26th January 2020),xinix (26th January 2020)

  7. #4
    Member
    Join Date
    Nov 2013
    Location
    Kraków, Poland
    Posts
    807
    Thanks
    245
    Thanked 257 Times in 160 Posts
    Quote Originally Posted by Shelwien View Post
    P = (1<<24)*(N1+0.75)/(N0+0.75+N1+0.75)
    Reminds "Krichevski-Trofimov (KT) estimator" from CTW: http://mattmahoney.net/dc/dce.html#Section_423

    But this kind of approach poorly adapts to local situation - finding average from beginning.
    For adapting to local situation it is better to use exponential moving average based, like
    CDF += (newCDF - CDF) >> rate

  8. #5
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,982
    Thanks
    298
    Thanked 1,309 Times in 745 Posts
    1) N is actually limited (255 at most), when limit is reached, it starts updating at fixed rate.

    2) Its actually used for APM counters. lpaq-like FSM counter state is mapped to actual probability using this.

  9. #6
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,982
    Thanks
    298
    Thanked 1,309 Times in 745 Posts
    - refactored FSM counter (former kLzModelLNext);
    - refactored APM counter (the variable-rate one above);
    - refactored main submodels (o1,o2,o3,o4,o6,o10,o24);

    An example. Other submodels look the same, just with different indexes.
      // order1 FSM+APM+stretch
    C_Update1(modele[0].ptr,bit,128);
    C_Update0(ptrs8[0],bit);
    ptrs8[0] = &modelv[ ((out_byte<<8)&0xFF00) + L_LUT[out_byte_wip] ];
    modele[0].ptr = &modele[0].model[*ptrs8[0]];
    factors[0] = stretch1( modele[0].ptr );

    // order2 FSM+APM+stretch
    C_Update1(modele[1].ptr,bit,254);
    C_Update0(ptrs8[1],bit);
    ptrs8[1] = &modelw[ ((out_byte<<8)&0xFFFF00) + L_LUT[out_byte_wip] ];
    ptrs8_cur[1] = *ptrs8[1];
    modele[1].ptr = &modele[1].model[*ptrs8[1]];
    factors[1] = stretch1( modele[1].ptr );
    byte hist_and = ptrs8_cur[1]!=0;
    byte hist_sum = hist_and;

    // hashed order3 FSM+APM+stretch
    C_Update1(modele[2].ptr,bit,254);
    C_Update0(ptrs8[2],bit);
    if( bit_index_in_byte==0 ) {
    hashes_arr28[2] = (out_byte<<8) ^ byte(out_byte) ^ 0xFFFFFF00;
    uint hash2 = 33595399 * hashes_arr28[2] ^ (hashes_arr28[2] >> 11);
    ptrs8[2] = CM_GetHashSlot(hash2) + 4;
    } else if( bit_index_in_byte==4 ) {
    ptrs8[2] = CM_PtrGetX(ptrs8[2], (byte)(out_byte_wip ^ hashes_arr28[2]));
    } else {
    uint incr = (bit + 1) << ((bit_index_in_byte & 3) - 1);
    ptrs8[2] += incr*4;
    }
    ptrs8_cur[2] = *ptrs8[2];
    modele[2].ptr = &modele[2].model[ptrs8_cur[2]];
    factors[2] = stretch1( modele[2].ptr );
    hist_and &= (ptrs8_cur[2]!=0);
    hist_sum += hist_and;
    Attached Files Attached Files

  10. Thanks (2):

    birdie (30th January 2020),Sergey3695 (28th January 2020)

  11. #7
    Member
    Join Date
    Apr 2009
    Location
    The Netherlands
    Posts
    83
    Thanks
    6
    Thanked 18 Times in 11 Posts
    What's the objective here? I might be missing some history... NanoZip source code lost?

  12. #8
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,982
    Thanks
    298
    Thanked 1,309 Times in 745 Posts
    Well, its another reverse-engineering project, but target is not Christian's, so nobody complains.
    Also author died.

    It was never open-source (although "nanozip -cc" aka nzcc seems to be a speed-optimized lpaq).

  13. #9
    Member
    Join Date
    Apr 2009
    Location
    The Netherlands
    Posts
    83
    Thanks
    6
    Thanked 18 Times in 11 Posts
    Quote Originally Posted by Shelwien View Post
    Also author died.
    Sad to hear!
    No one in his neighborhood is able to share his code?
    Maybe someone is still reading private messages on: https://fi.linkedin.com/in/runsas ?

  14. #10
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,982
    Thanks
    298
    Thanked 1,309 Times in 745 Posts

  15. #11
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    513
    Thanks
    210
    Thanked 352 Times in 188 Posts
    Quote Originally Posted by Shelwien View Post
    Decrypted an interesting counter.
    It looks like this in the code:
    //  kDivideLookup[0] = 255;


    The interesting part is the "vector" update: x + (tt<lim) + (y & 0xFFFFFF00)
    updates both occurrence counter and the probability parts without shifts.
    ​Setting lower bound limit allows to avoid shift.
    KZo


  16. #12
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,982
    Thanks
    298
    Thanked 1,309 Times in 745 Posts
    - submodel detector based on entropy estimation
    - bitmodel - unaligned bit model with 17-bit context
    - wordmodel (parses last english word until length 128 or non-letter)
    - sparse order1 FSM+APM: xxxxxxxx 00000000; disabled if normal order1 gives better results
    - masked order3 FSM+APM: xxx00000 xxx00000 xxx00000; only used for high 5 bits
    - logistic mixing with context based on wordmodel and matchmodel
    - interpolated SSE<13>
    - adaptive logistic extrapolation ("TSE")

    Code:
    202310   0.953s 0.985s // v0
    202310   1.000s 1.015s // v1
    202310   1.078s 1.093s // v2
    202310   1.063s 1.078s // v3
    
    202728   1.000s 1.016s // TSE disabled
    213440   1.000s 1.000s // SSE disabled
    221784   0.937s 0.937s // SSE+TSE disabled
    Main model mostly done, matchmodel remains.
    Attached Files Attached Files

  17. Thanks (2):

    kaitz (31st January 2020),Mike (1st February 2020)

  18. #13
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,982
    Thanks
    298
    Thanked 1,309 Times in 745 Posts
    About hashtable...
    Its designed like this.
    There're 64-byte cells allocated per nibble.
    Each cell contains 4 slots for binary trees (interleaved, maybe to avoid *15) and a hash byte per slot.
    When a slot for an unknown hash byte is requested, a slot with min high-bit-counter value is chosen and cleared out, hash byte is replaced.
    When hash byte is known, it just normally returns a pointer to corresponding counter tree.

    Original code uses some tricky arithmetics there:
      byte v4 = (rp[5] < rp[4]);
    byte v5 = -(rp[6] >= rp[v4 + 4]);
    byte v6 = (v5 & v4) + (~v5 & 2);
    byte v7 = -(rp[7] >= rp[v6 + 4]);
    rp += (v7 & v6) + (~v7 & 3);

    But that's actually just finding the index of min of 4 bytes.

    New version looks like this:
    byte* __restrict FindCounter( byte* __restrict p, byte b ) {

    // find a slot with matching high byte of hash
    uint i = (p[0]==b) + 2*(p[1]==b) + 4*(p[2]==b) + 8*(p[3]==b) + 16;
    i = _bit_scan_forward(i);
    if( i<4 ) return p+4+i;

    // find a slot with min counter (for high bit)
    i = (p[4+1]<p[4+0]) ? 1 : 0;
    i = (p[4+2]<p[4+i]) ? 2 : i;
    i = (p[4+3]<p[4+i]) ? 3 : i;

    // claim the slot for current context
    p[i] = b;

    // init the counters
    p += 4+i;
    for( i=0; i<15; i++ ) p[4*i]=0;

    return p;
    }

    // input: p = high nibble counter location, b = hash byte
    byte* __restrict CM_PtrGetX( byte* __restrict p, byte b ) {
    p -= (p-((byte*)0)) & 63; // align 64
    p += 64 * (b|1); // |1 just to always get a different slot than high nibble?
    return FindCounter( p, b );
    }

    byte* __restrict CM_GetHashSlot( uint hash ) {
    return FindCounter( &some_ptr[(hash<<6) & some_mask], hash>>24 );
    }


    And apparently tricky arithmetics don't really help - speed didn't change, and new code is still branchless.
    There seemed to be some potential for vectorization, so I tried rewriting it with _mm_minpos_epu16 (for finding p[i]==b and for finding min),
    and vector AND 0xFFFFFF00 for clearing out the slot, but speed didn't improve.

    Some choices are pretty random though - *15 would be probably better for caching (and imho it would be better to use 16 slots and vector math),
    clearing the slot saves 4 bytes on book1, and (b|1) loses 4 bytes.

  19. Thanks (2):

    comp1 (2nd February 2020),Mike (2nd February 2020)

Similar Threads

  1. NanoZip - a new archiver, using bwt, lz, cm, etc...
    By Sami in forum Data Compression
    Replies: 305
    Last Post: 27th July 2020, 12:48
  2. Nanozip LZT ?
    By cbloom in forum Data Compression
    Replies: 47
    Last Post: 11th February 2017, 07:42
  3. Code generation in LZ decoder / Branchless LZ77 Decoder
    By Shelwien in forum Data Compression
    Replies: 1
    Last Post: 30th September 2010, 20:48
  4. Preprocessors and filters and Nanozip??
    By kampaster in forum Data Compression
    Replies: 18
    Last Post: 9th July 2010, 20:42
  5. UNZ - minimal LZW decoder + source
    By encode in forum Forum Archive
    Replies: 7
    Last Post: 29th January 2008, 14:54

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •