Page 1 of 4 123 ... LastLast
Results 1 to 30 of 92

Thread: Experiments with orderN-CM

  1. #1
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    287
    Thanks
    9
    Thanked 33 Times in 21 Posts

    Experiments with orderN-CM

    Hey folks,

    recently I had the time to play again with my orderN-CM code.

    Basically its a stacked 2-input mixer chain in the linear domain fed by simple occurence counters.

    I tested some variants
    -SSE stage after each mixer (Mixer-SSE)
    -SSE stage after each counter (Counter-SSE)

    SSE-Context is the full order-1 context+1 bit if the higher context escaped

    Sadly memory consumption is very high in the combination: many sse-bins+good sse-context, which really helps the Counter-SSE.

    Here are some results on SFC and order6, parameters fell out of my hat

    No code has to be inserted here.as one can see english.doc, FP.log and vcfui.hlp improve a remarkable account.

    Comments are welcome.
    Last edited by Sebastian; 17th November 2011 at 09:51.

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,423
    Thanks
    223
    Thanked 1,052 Times in 565 Posts
    I'd say, you have to do better than LZ at least (on all files).
    Here's a plzma comparison:
    Code:
                      c0       c2       c1 
    A10_jpg       831815   835045   828269   825041
    acrord32_exe 1403010  1378653  1367620  1460469*
    english_dic   792604   746703   740661   537977
    FlashMX_pdf  3706341  3687700  3655476  3709109*
    fp_log        703043   676218   664495   551603
    mso97_dll    1799234  1774421  1766123  1833167*
    ohs_doc       769113   765173   756387   813697*
    rafale_bmp    986433   954963   946421   757733
    vcfiu_hlp     609362   594450   590366   638729
    world95_txt   566501   554437   553220   494670
                12167456 11967763 11869038 11622195
    Your results marked with * are worse than lzma; note that plzma doesn't have any filters (not even e.

  3. #3
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    zpaq -m3 (each file compressed separately):
    a10.jpg 842468 -> 828408
    acrord32.exe 3870784 -> 1337088
    english.dic 4067439 -> 490812
    FlashMX.pdf 4526946 -> 3648332
    fp.log 20617071 -> 433180
    mso97.dll 3782416 -> 1691666
    ohs.doc 4168192 -> 758166
    rafale.bmp 4149414 -> 734358
    vcfiu.hlp 4121418 -> 523829
    world95.txt 2988578 -> 460485
    Total 53134726 -> 10906324. 111.425 MB memory per thread needed.

    -m3 uses mid.cfg, an order 0 ICM, order 1..5 ISSE chain, order 7 match model and order 1 logistic mixer.

  4. #4
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    287
    Thanks
    9
    Thanked 33 Times in 21 Posts
    Matt could you test without match-model and final mix?

    Sure there are lots of things to do - but my aim is mainly text-compression.

  5. #5
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    Code:
    Memory (MB)   Size  Compressed Checksum File
    ------   ---------  ---------- -------- ----
        77      842468      833191 4f517b12  maxcomp\a10.jpg
        77     3870784     1399365 ab98775e  maxcomp\acrord32.exe
        77     4067439      505453 29c9a25b  maxcomp\english.dic
        77     4526946     3684853 00252216  maxcomp\FlashMX.pdf
        77    20617071      568996 43777ed0  maxcomp\fp.log
        77     3782416     1753356 ce2de546  maxcomp\mso97.dll
        77     4168192      813803 05bdfbac  maxcomp\ohs.doc
        77     4149414      742319 eca398cd  maxcomp\rafale.bmp
        77     4121418      631594 846be1a5  maxcomp\vcfiu.hlp
        77     2988578      546257 d840a0c1  maxcomp\world95.txt
    Total size is 11,479,187. (Listing is from zpaq 4.01, not released yet ) Config file is mid.cfg with just an order 0..5 ICM-ISSE chain, no match or mix. $1 is 0.

    Code:
    comp 3 3 0 0 6 (hh hm ph pm n)
      0 icm 5        (order 0...5 chain)
      1 isse 13 0
      2 isse $1+17 1
      3 isse $1+18 2
      4 isse $1+18 3
      5 isse $1+19 4
    hcomp
      c++ *c=a b=c a=0 (save in rotating buffer M)
      d= 1 hash *d=a   (orders 1...5 for isse)
      b-- d++ hash *d=a
      b-- d++ hash *d=a
      b-- d++ hash *d=a
      b-- d++ hash *d=a
      halt
    post
      0
    end

  6. #6
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    287
    Thanks
    9
    Thanked 33 Times in 21 Posts
    You're far far away from my results

    lets only look at "world95.txt"

    order 5: 571.915 and with mixer-sse 552.105

    back to school

  7. #7
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    287
    Thanks
    9
    Thanked 33 Times in 21 Posts
    Surprisingly the rolling hash-function made a not so small difference.

    Results for order6 and SFC, only using mixerSSE with 32 bins

    No code has to be inserted here.current results with both sse-types, order 8 and a hashsize of 2^28

    Code:
    A10_jpg        824.626
    acrord32_exe 1.417.082
    english_dic    538.978
    FlashMX_pdf  3.684.293
    fp_log         560.623
    mso97_dll    1.776.007
    ohs_doc        804.227
    rafale_bmp     751.533
    vcfiu_hlp      597.576
    world95_txt    492.401
    ----------------------
                11.447.346
    Last edited by Sebastian; 23rd November 2011 at 10:56.

  8. #8
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    287
    Thanks
    9
    Thanked 33 Times in 21 Posts
    I added a last-bit state to the sse-stages, params where untouched

    order8 - hash size 2^28
    Code:
                 mix-sseV2  +cnt-sseV2
    ----------------------------------
    A10_jpg        826.922     826.894
    acrord32_exe 1.389.579   1.387.975
    english_dic    531.214     514.601
    FlashMX_pdf  3.682.512   3.689.087
    fp_log         556.560     541.961
    mso97_dll    1.750.952   1.751.652
    ohs_doc        801.595     796.364
    rafale_bmp     750.042     751.762
    vcfiu_hlp      587.341     586.936
    world95_txt    484.757     487.563
    ----------------------------------
                11.361.480  11.340.842
    seems that the counter-sse is now more or less obsolete

  9. #9
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    287
    Thanks
    9
    Thanked 33 Times in 21 Posts
    I added the possibility to use different mixer rates and init-values for every order

    I added some ad-hoc mixer-rates and left the init-values untouched
    every option is specified in a "profile.txt"

    using order 8, and hash-size 2^28, occupying around 1GB of memory

    Code:
    ------------------------
    A10_jpg         826.947   
    acrord32_exe  1.385.276   
    english_dic     529.755   
    FlashMX_pdf   3.685.211   
    fp_log          556.391 
    mso97_dll     1.745.209  
    ohs_doc         801.029    
    rafale_bmp      746.256   
    vcfiu_hlp       588.866   
    world95_txt     482.780
    -----------------------
                 11.347.720
    next on the list:
    convert the frequency counters into fsm to reduce memory-need
    add final logistic mix
    reanimate optimizing module

    Perhaps someone can enlighten me. my current hashing looks like this:

    at byte boundaries: make a rolling-hash for every order
    at bit boundaries: make a rolling temp-hash for every bit

    I read about something like "nibble" boundaries. What does that mean?
    I tried updating the bit hashes on 4-bit boundaries but that did worse.

    Attached is a current compile. Use at your own risk! There is no error-checking for too less memory and so on.
    Attached Files Attached Files
    Last edited by Sebastian; 25th November 2011 at 20:49.

  10. #10
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,423
    Thanks
    223
    Thanked 1,052 Times in 565 Posts
    > convert the frequency counters into fsm to reduce memory-need

    It'd be really cool if you could find some solution, which would be different
    from frequency quantization used by everyone else.
    Also consider counter states with >8 bits (you won't save any memory like
    that, but such a state can store more information about context history and
    thus provide better predictions) and state-machine counterparts for other
    components (eg. mixers).

    > I read about something like "nibble" boundaries. What does that mean?

    Nibble is 4 bits, yeah.
    And the things you've seen are probably about cache line considerations.
    Commonly a CM compressor spends like 70% of time waiting on counter lookups,
    so any way of cache miss elimination really helps.
    And as to nibbles, the most simple way is when partial byte context layout is
    like this: ctx = (256+c)>>(8-bits); // bits=0..7, count of processed bits of byte c
    In this case the first memory lookup on bits=0 reads the counters for first 4-5 bits
    into the cache, but after that every next bit's counter is in its own separate
    cache line, which is bad.
    And the workaround is to change the layout to binary tree per nibble, instead of per byte.

    Also see http://mattmahoney.net/dc/dce.html#Section_412
    But that's a wrong way to do it

  11. #11
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    ZPAQ uses nibble boundaries to update context hashes, but contexts always start on byte boundaries. So on a byte boundary, the context is a hash of the last 8n bits. On a nibble boundary, the context is a hash of the last 8n + 4 bits. The hash is an index into a table of 16-byte rows that include one 8-bit checksum confirmation and 15 bit histories. The elements of the row are indexed by the 0 to 3 bits that follow the nibble boundary.

    A bit history is an 8 bit state that represents a pair of counts and a last bit, like (N0,N1,LB). It is updated by incrementing either N0 or N1 and setting LB to the last bit, with some bounds on the counts that send it back to a smaller count pair if needed. This is all done by a 256x2 table lookup on the current state and the update bit.

    One element of the row is used as a hash confirmation. The ZPAQL program computes a hash, h, on byte boundaries. On nibble boundaries between bytes, the hash is updated to h+16*1xxxx where 1xxxx is a number in the range 16-31 depending the previous 4 bits. For a table of size 2^n rows, the low n bits index the table and the next higher 8 bits are used as the checksum. During lookup, it looks for a matching checksum at h, h xor 1, and h xor 2. The table is aligned on a 64 byte memory address, so these 3 addresses stay in the same cache line. If it doesn't find a match, then it replaces the least frequently used row, which is estimated from N0+N1 of the first bit history. The states are ordered by increasing N0+N1 to make this a simple byte comparison.

    It is possible to compute any hash function you want in ZPAQL. However there are 2 instructions that compute the hash h=(h+c+512)*773, which gives a good distribution in the low 8n bits for an order-n context. I picked 773 = (3<<+5 because it can be computed efficiently with x86 shift, add, and lea instructions, although in my tests, imul is just as fast.

  12. #12
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    287
    Thanks
    9
    Thanked 33 Times in 21 Posts
    thanks for your comments , linear probing is already done up to 8.
    Last edited by Sebastian; 26th November 2011 at 01:42.

  13. #13
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    287
    Thanks
    9
    Thanked 33 Times in 21 Posts
    some results with order-8 (enwik8 is currently better with order-6) and 1GB of memory which could be probably reduced to about 400mb of which 250mb are SSE

    Code:
    calgary.tar    728.413 Bytes
    enwik8      22.151.330 Bytes

  14. #14
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    287
    Thanks
    9
    Thanked 33 Times in 21 Posts
    I tested some more things

    -a floating point linear-mixer which minimizes coding cost (expected some precision problems), which did not helped in the cascade, whatever I tried. Somehow I remember in doing this before

    -a orderN final mix for orders 1..N, also a linear mixer which minimizes coding cost (currently floating point, but it looks not too hard to convert into integer)

    context is only the maximum available order, more context-info hurts or I am to dumb to find something useful

    same params as before (order-8 and 2^28 hashtable), an ad-hoc learning rate, no sse for the final mix, I'd like to post some more today

    sadly fp_log got a huge drop-down, but this file is special as it prefers high orders and should be treated correctly when I add a match-model (files marked blue)

    Code:
    start mix at    order 1     order 2  o2+cntSSEV2
    
    book1           208.930     208.816     208.457
    calgary.tar     717.536     717.534     717.648
    
    -----------------------------------------------
    A10_jpg         825.777     825.746     825.103
    acrord32_exe  1.365.452   1.366.837   1.364.701 
    english_dic     528.743     527.570     511.882
    FlashMX_pdf   3.664.719   3.666.136   3.667.273
    fp_log          577.632     566.588     551.677
    mso97_dll     1.721.252   1.723.313   1.722.229
    ohs_doc         796.151     795.906     790.500  
    rafale_bmp      737.777     737.493     738.618
    vcfiu_hlp       580.024     578.183     578.027
    world95_txt     474.250     472.709     473.811
    -----------------------------------------------
                 11.271.777  11.260.481  11.223.821
    


  15. #15
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    287
    Thanks
    9
    Thanked 33 Times in 21 Posts
    I accidentally found results from scm 0.0.1b on maximumcompression.com and don't know if i like that. And because of that I'll wait before releasing another binary, let alone the source.

    Anyway I added a final SSE stage, whole contexts are still a problem, because every file favors different ones, so performance on book1 is a little lower now but sfc improved.

    Comparison should be made to the first column of the previous post. I'm working now on LZP.

    Code:
    
    ------------------------ 
    A10_jpg          825.962    
    acrord32_exe   1.353.330    
    english_dic      523.688    
    FlashMX_pdf    3.664.151 
    fp_log           548.181 
    mso97_dll      1.711.070   
    ohs_doc          789.236
    rafale_bmp       738.546    
    vcfiu_hlp        571.824 
    world95_txt      469.564 
    ------------------------
                  11.195.552
    Last edited by Sebastian; 3rd December 2011 at 21:13.

  16. #16
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    287
    Thanks
    9
    Thanked 33 Times in 21 Posts
    I've tried implementing a match-model but did not really succeed.

    LZP matches the longest context through a order4 hash-table.
    Then the prediction that the next bit is one is simply p=(matchbyte>>bitpos)&1, where matchbyte is the byte following the matched context.
    This prediction is added to the mixer in case we had not encountered a mispredicted bit, then p=0.5.

    Things I tested:
    - larger orders than 4
    - adjusting p according to length of matched context.
    - minimum match-len larger than 1
    - mixer-sse has to be disabled because the final mixer does not like mixed distributions

    As I've seen huge improvements from other authors when they incooperated a match-model lets me think I am doing something wrong, but around 17MB of 20MB from fp.log have a matchlength larger than 8 and 96% from that are predicted correctly.

    Maybe Matt can make some small tests for the improvement of the match-model.

    For the following stats I've disabled SSE completely, only Least-Square-Cascade+Least-Cost Final Mix on SFC

    Code:
          no match       match
    o2  15.427.171  13.985.950   9.3% 
    o3  13.146.938  12.896.157   1.9%
    o4  12.251.104  12.195.239   0.4%
    o5  11.988.078  11.939.141   0.4%
    o5 (MinLen=8)   12.019.778
    Last edited by Sebastian; 4th December 2011 at 21:24.

  17. #17
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,475
    Thanks
    26
    Thanked 121 Times in 95 Posts
    A few years ago I've designed a bytewise coder with o2 PPM andb o4 and o8 LZP models. I've written it primarily in assembly, but I've done also Java port. I hope you'll find it useful: http://encode.su/threads/390-Java-port-of-TarsaLZP

    Every encoded symbol was preceded with match flag. When there was a match then symbol wasn't encoded. When there was a mismatch then the predicted symbol was removed from alphabet and then the actual symbol was encoded with that alphabet. That way the encoding does not have any redundancy. I think you can apply it to bitwise coder as well.
    Last edited by Piotr Tarsa; 4th December 2011 at 21:15.

  18. #18
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    287
    Thanks
    9
    Thanked 33 Times in 21 Posts
    Applying symbol exclusion to bitwise-models is a pain in the ass. But thanks for pointing to your code

  19. #19
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    I didn't use exclusion in paq9a's LZP preprocessor or ZPAQ's original min.cfg model (later abandoned), so compression got noticeably worse unless minimum match length was very long. Currently ZPAQ does not have any built-in LZP but it does use match models which output a stretched probability with value proportional to 1/n or 1-1/n for length n depending on if the predicted bit is 0 or 1. This is mixed with other predictions in the usual way. The idea is that if a match succeeded in predicting the last n bits, then the probability that the next bit will be mis-predicted is k/n, where k depends on the source. For example, Witten and Bell did experiments and found k ~ 0.03 for runs of 0's and 0.05 for runs of 1's in text (for some context). So I just output a fixed prediction and let the mixer learn k.

  20. #20
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    287
    Thanks
    9
    Thanked 33 Times in 21 Posts
    By LZP I only mean backward-prediction. Not flag-based.

    edit:

    I've changed ZPAQs mid.cfg to
    Code:
    comp 3 3 0 0 7 (hh hm ph pm n)
      0 icm 5        (order 0...5 chain)
      1 isse 13 0
      2 isse $1+17 1
      3 isse $1+18 2
      4 isse $1+18 3
      5 isse $1+19 4
      6 mix 16 0 6 24 255  (order 1)
    ...
    hope that was right...performance on book1 dropped from 208.699 to 227.054. Do you do some magic in the match model or is my model already good enough?
    Last edited by Sebastian; 5th December 2011 at 00:04.

  21. #21
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    287
    Thanks
    9
    Thanked 33 Times in 21 Posts
    I think I found a/the trick...lol

    I updated the hash-table at every byte...that was bad, and I thought that multiple hashes per hash-entry could help here but simply extending the old-match did it.

    Thanks Matt
    Last edited by Sebastian; 5th December 2011 at 10:19.

  22. #22
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    You also need to take out the HCOMP code that computes the context for the MATCH component. Otherwise the MIX is getting the order 7 context for the MATCH. The following gives 210,554 for book1.zpaq

    Code:
    comp 3 3 0 0 7 (hh hm ph pm n)
      0 icm 5        (order 0...5 chain)
      1 isse 13 0
      2 isse $1+17 1
      3 isse $1+18 2
      4 isse $1+18 3
      5 isse $1+19 4
      6 mix 16 0 6 24 255  (order 1)
    hcomp
      c++ *c=a b=c a=0 (save in rotating buffer M)
      d= 1 hash *d=a   (orders 1...5 for isse)
      b-- d++ hash *d=a
      b-- d++ hash *d=a
      b-- d++ hash *d=a
      b-- d++ hash *d=a
      d++ a=*c a<<= 8 *d=a       (order 1 for mix)
      halt
    post
      0
    end

  23. #23
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    287
    Thanks
    9
    Thanked 33 Times in 21 Posts
    Thanks, ZPAQL is more complicated than C

    How is the context for the mixer contructed or rather what does "d++ a=*c a<<=8 *d=a" mean.

    As you shift, is it something like (history[-1]<<+ctx, where ctx is the partially coded byte?

    More tests in the morning. I'm getting used to the match-model
    The context for the final mixer is now "(usedorder<<2)+t" where t={0,1,2,3} if the MatchLen is {0,>0,>=8,>=32} respectively, and look what we got here
    so basically 5 lines of code got me about 9.5% compression improvement. Finally we are hitting the results of the first post with mixer-sse, by using final-mix+match-model, which is much faster.

    Code:
          no match      match  
    o2  15.427.171  12.527.564  18.8%
    Last edited by Sebastian; 5th December 2011 at 10:04.

  24. #24
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    The ZPAQL configuration file language is described here: http://mattmahoney.net/dc/zpaq.1.htm...guration_files

    Configuration file translation is mostly straightforward. 1 word usually translates to a single byte. The formal specification of the byte code can be found in the spec: http://mattmahoney.net/dc/zpaq1.pdf

    ZPAQL is designed to produce small code and execute fast, not to be easy to write code in.

    It is used in 2 places: for computing contexts and optionally for post-processing. In the config file, these are the HCOMP and PCOMP sections. If there is no post-processor, then it ends with POST 0 END for historical reasons. (When I was developing ZPAQ I was experimenting with a fixed set of post-processors, but figured a language would be better. I kept the syntax to not break old config files).

    A ZPAQL virtual machine has 32 bit registers A (accumulator), B, C, D (index registers), R0...R255, and a 1 bit condition flag F. There are 2 memory arrays whose sizes can be powers of 2: an array of 32 bit values H, and an array of 8 bit values M. B and C can be used as pointers to M, and D into H. Only the low bits are used to index these, so they can never be out of bounds. The machine state is set to all zeros at the start of a block and otherwise preserved between calls except for the input A.

    The HCOMP program is called once after each byte is modeled with that byte in the A register. The program writes its output to the first n elements of H and then executes HALT to return. Then H[i] is the context for component number i. Modeling is bitwise, so this context (usually a hash, but can also be direct) is combined with the 0 to 7 bits following the byte boundary for modeling purposes. This is done in different ways depending on the component type. The PCOMP section outputs using the OUT instruction. H has no special meaning.

    The line: COMP 3 3 0 0 7 (hh hm ph pm n)

    gives the sizes of the arrays H and M in the HCOMP and PCOMP sections respectively, and the number of components, n. The sizes are log2, so H and M in the HCOMP section each have 8 elements. libzpaq in 64 bit programs supports sizes up to 32 (16 + 4 GB). The line:

    c++ *c=a b=c a=0 (save in rotating buffer M)

    increments C, stores A (the input byte) in M[C], assigns C to B, and sets A to 0. I use M as a rotating buffer of the last 8 bytes and C as a pointer to the current byte. I use B to retrieve bytes from the buffer and compute hashes. The line:

    d= 1 hash *d=a

    sets D to 1 (so it points to the context for component number 1) computes a hash in A and then writes the result to H[D]. The HASH instruction computes A=(A+*B+512)*773. At this point it is just an order 1 hash of the byte just stored. I left H[0] as 0 so it is the order 0 context. (Note that "d= 1" has a space after the "=" but not before. This is required because it is a 2 byte instruction. "d=" translates to the first byte and "1" translates to the second. This was not required for "a=0" because there is a 1 byte ZPAQL opcode to clear a register). The line repeated 4 times:

    b-- d++ hash *d=a

    decrements B, increments D, updates the hash in A from M[B], and stores it in H[D]. Thus, it computes the order 2 through 5 context hashes for the next 4 components. The line:

    d++ a=*c a<<= 8 *d=a

    computes a non-hashed 16 bit context for the MIX. Recall that C still points to the last input byte stored in M. The code retrieves it into A, shifts it left 8 bits (note the space before 8; this is a 2 byte opcode), and stores it in H[D]. For a MIX, the bitwise context is computed by adding the 0 to 7 bits with a leading 1 bit to H[D]. So the contexts actually seen by the MIX are:

    aaaaaaaa00000001
    aaaaaaaa0000001x
    aaaaaaaa000001xx
    ...
    aaaaaaaa1xxxxxxx

    where aaa... is the previous byte (originally input from A) and xxx... is the bits modeled after it. The size of the MIX context is from the line in the COMP section:

    6 mix 16 0 6 24 255

    It says that component 6 (they must be in order starting from 0) is a MIX with a 16 bit context, taking input from the range of 6 components starting at 0, with a learning rate of 24, and a bit mask of 255 ANDed to the bitwise context (1xxxxxxx). The other components are:

    0 icm 5

    Indirect context model (context -> bit history -> prediction) with a hash table size of 4*2^5 rows of 16 bytes each, or a total of 64*2^5 bytes. It interprets the low 5+2 bits of H[0] as an index into the hash table and the next higher 8 bits as a checksum for detecting collisions. (Actually it uses h=H[0]+16 and h=H[0]+16*(1xxxx) as the initial hashes for the 2 nibbles, where xxxx is the first nibble), then tries h, h^0x1, h^0x2). It turns out that for order 0, 5 is the smallest table that never ejects a row due to hash collisions. The components like:

    2 isse $1+17 1

    are ISSE (indirect secondary symbol estimators). These map a context to a bit history just like ICM with the same kind of hash table. But then the bit history selects a pair of weights from a 256x2 table to mix the stretched prediction from another component with a fixed constant 1 and output a new prediction. The parameter $1+17 selects a table size of 4*2^17 rows. Thus, the low 19 bits of H[2] is the context, and the next higher 8 bits is the checksum. The input prediction comes from component 1. The ISSE components are arranged in a chain from lower to higher orders. The $1 means that if you pass an argument to the config file, then it will be added to the 17. You can always write "$m+n" anywhere in a config file that a number n is expected, and the m'th argument will be added to it. (No other arithmetic operations are supported). An argument can be negative. Thus:

    zpaq -mmid,-2,3 c archive file

    would substitute -2 for $1 and 3 for $2. You can have parameters up to $9.

    ZPAQL is not case sensitive. Parenthesis denote comments and can be nested.

  25. #25
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    287
    Thanks
    9
    Thanked 33 Times in 21 Posts
    Special thanks to you

    Aren't you tired of answering the same questions all over again?

    I'll stop bothering you and look into the amountless material from you and others.

    I also tried in using the full o1-context for the final mixer (as this is most often a good context) but that did not helped, or it is a matter of parameter optimization.

  26. #26
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    Order-1 mixer isn't always the best. You can experiment. Some day I might write an optimizing compressor that tries lots of different parameters on a file and picks the best one. It would be very slow of course. The search space is impossibly big.

  27. #27
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    287
    Thanks
    9
    Thanked 33 Times in 21 Posts
    Thats one thing I am trying to do. No context optimization but online parameter optimization, well scm is too slow at the moment and I don't like hashing, but a more accurate structure would even consume more memory.

    some more tests

    final mix suffers a bit from mixed distributions (maybe a logistic mix would help here) and its improvements are not as impressive as before. Fp.log got nearly 100kb better.
    Results look like a hammer vs. swiss-knife-ZPAQ but anyway.

    Same params as before, using 4mb match-model with 512kb hash, order8 CM with 2^28 element hash, each sse uses 32-vertices. As said before, results with match-model are as good as results without match-model+mixer-sse. Timings where done on 3.6Ghz C2D...and look its slow as hell.

    Code:
                              +mixer-sse
    A10_jpg          827.717     825.936
    acrord32_exe   1.375.568   1.349.392
    english_dic      565.834     520.474
    FlashMX_pdf    3.663.091   3.659.840
    fp_log           479.205     464.095
    mso97_dll      1.734.100   1.709.712
    ohs_doc          797.178     777.017
    rafale_bmp       741.500     738.769
    vcfiu_hlp        552.314     539.179
    world95_txt      467.036     459.309
    ------------------------------------
                  11.203.543  11.043.723
    time               05:59       10:29
    Next on the list is FSM where I'm half-way through.


  28. #28
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    287
    Thanks
    9
    Thanked 33 Times in 21 Posts
    All modules can now be selected through "profile.txt" which looks like this. You can use comments as well. This introduces many branches per bit but lets me discover data depencies.

    For example the match-model improves through sse which is senseless but shows that the estimate is improvable. Some context where slightly improved.

    Code:
    memhash=29
    maxorder=8
    
    counter_rate=32
    
    mixer_rate=0.20,0.15,0.10,0.05,0.02,0.02,0.01
    mixer_init=0.10
    
    mixer_sse=0
    mixer_sse_trust=0.60 
    mixer_sse_rate=0.010
    
    counter_sse=0
    counter_sse_trust=0.60
    counter_sse_rate=0.010
    
    match_model=1
    match_win_size=22
    match_hash_size=18
    
    final_mix_rate=0.001
    
    final_sse=1
    final_sse_trust=0.75
    final_sse_rate=0.020
    intermediate sse gets more and more useless as the probability estimates gets better.

    Hitting around 11.150.xxx on sfc and 20.912.xxx on enwik8.

    order2 + match-model gets around 250k on book1
    Last edited by Sebastian; 6th December 2011 at 23:15.

  29. #29
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    287
    Thanks
    9
    Thanked 33 Times in 21 Posts
    Experimenting with building appropriate fsm-counters.

    So far I have on with exact bit-history up to n-bits. This one needs about 2^12 states to be in the range of the frequency-counter.

    And one which approximates the frequency-counter which needs about 2^7 states.

    Still the original frequency-counter is a bit better (I am loosing around 50k on sfc) because I did not tried to emulate its specific rescaling structure which happens between (8,..,16)-inputbits.

    I'll try to merge the two which will end into something like paq-counters i guess.

    Or I build one with exact bit-histories up to 16-bit and merge from large to low states until desired maximum state is reached.

    And one question to the compiler specialists...why does my program crash if I set #pragma pack(1) just for the fun of it?

  30. #30
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,423
    Thanks
    223
    Thanked 1,052 Times in 565 Posts
    > crash if I set #pragma pack(1)

    http://www.cplusplus.com/forum/general/14659/
    pragma pack only affects alignment of structure fields.
    So the only real reason for crashing would be something like aligned vector operation on a unaligned field.

    But I suppose random bugs (overflows etc) can show up because of pack(1) too.

    Normally a crash is a good thing though, because you're supposed to get an exception code and address.

Page 1 of 4 123 ... LastLast

Similar Threads

  1. Experiments with small dictionary coding
    By Matt Mahoney in forum Data Compression
    Replies: 13
    Last Post: 28th June 2010, 17:34
  2. Ordered bitcodes experiments
    By Shelwien in forum Data Compression
    Replies: 19
    Last Post: 30th May 2009, 04:45

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •