Page 2 of 5 FirstFirst 1234 ... LastLast
Results 31 to 60 of 149

Thread: CMV

  1. #31
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 798 Times in 489 Posts
    zpaq does not use dictionary preprocessing. I am pretty sure that nanozip doesn't either, but it is closed source.

  2. Thanks:

    Mauro Vezzosi (2nd October 2015)

  3. #32
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    290
    Thanks
    120
    Thanked 169 Times in 125 Posts
    Cmv has 2 very simple "hidden" models, not mentioned in the help because they are always enabled: they predict a fixed probability of 1/64 and 1-1/64.
    They are 2, not 1, to be neutral at the first prediction for every context of the mixers.
    I tried out other values, static and "simple dynamic", but 1/64 and 1-1/64 seems to be, on average, good values (at least for cmv; find your best values).
    I hope this info is useful to someone else to easily improve just a little bit his compressor.
    The following are the tests on Maximum Compression corpus, only 2 file size are better without these 2 models (they are in bold).
    First row: without these 2 models. Second row: with these 2 models.
    Code:
    Options: -m0         -m1         -m2     -m2,0,+     -m2,0,*     Original
         823.793     821.889     821.814     821.288     821.454      842.468   A10.jpg
         824.982     821.357     821.163     820.696     820.749      842.468   A10.jpg
    
       1.442.518   1.085.529   1.060.145   1.032.956   1.022.082    3.870.784   AcroRd32.exe
       1.426.255   1.078.743   1.055.369   1.028.814   1.019.353    3.870.784   AcroRd32.exe
    
         523.543     433.651     419.727     408.433     403.585    4.067.439   english.dic
         516.241     422.134     418.248     407.492     403.026    4.067.439   english.dic
    
       3.656.148   3.597.809   3.591.910   3.581.437   3.578.156    4.526.946   FlashMX.pdf
       3.659.901   3.595.144   3.589.644   3.579.920   3.577.000    4.526.946   FlashMX.pdf
    
         426.329     319.375     303.508     291.295     281.545   20.617.071   FP.LOG
         425.495     317.694     302.262     290.766     281.065   20.617.071   FP.LOG
    
       1.802.769   1.456.646   1.437.293   1.406.147   1.394.586    3.782.416   MSO97.DLL
       1.787.234   1.447.944   1.431.077   1.400.887   1.391.254    3.782.416   MSO97.DLL
    
         771.103     703.874     696.319     691.787     689.985    4.168.192   ohs.doc
         768.606     701.802     694.571     690.376     688.787    4.168.192   ohs.doc
    
         759.529     695.045     690.190     668.721     654.441    4.149.414   rafale.bmp
         759.113     691.838     687.494     667.781     653.804    4.149.414   rafale.bmp
    
         556.788     431.005     416.891     406.896     401.914    4.121.418   vcfiu.hlp
         549.918     426.997     414.090     404.962     400.620    4.121.418   vcfiu.hlp
    
         484.130     401.472     388.999     383.430     375.359    2.988.578   world95.txt
         482.506     399.994     387.770     382.490     374.695    2.988.578   world95.txt
    
      11.246.650   9.946.295   9.826.796   9.692.390   9.623.107   53.134.726   Total
      11.200.251   9.903.647   9.801.688   9.674.184   9.610.353   53.134.726   Total
    
      11.441.007  10.005.854   9.888.693   9.759.177   9.694.053   53.144.064   Tarball
      11.392.922   9.961.876   9.861.946   9.740.483   9.680.949   53.144.064   Tarball
    ----------
    I found better options for the DNA file SRR062634.filt.fastq (1, 2): new file size 14.897.825 (-m2,3,0x00a968fd), -0,41% than previous 14.958.423 (-m2,3,0x03ededff (>&b12)).
    ----------
    I found 2 bugs in the word model, they don't corrupt the .cmv file, but they hurt the compression ratio:
    - The model for the context of bytes >= 128 sometimes/often predicts badly values.
    - The order 2 of the word model predicts like order 0.
    A further improvement to do: inexplicably, I tested the word model only with CM, but ICM appears to be better (and need less memory).
    Last edited by Mauro Vezzosi; 2nd October 2015 at 00:51. Reason: Text tuning

  4. #33
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    290
    Thanks
    120
    Thanked 169 Times in 125 Posts
    CMV 0.1.1 is compatible with version 0.1.0 and doesn't change compression ratio or speed.
    History.txt:
    Code:
    0.1.1 - 2016/01/10
    - Added 64 bit version CMV64.exe.
    - Added "a" command to "analyse" the source file.
      It tries to find out and disable the models that hurts the compression ratio.
      See switch "-a".
    - Added combinations of commands:
      - Analyse then Compress: ac InputFile CmvFile
      - Compress then Expand: ce InputFile CmvFile OutputFile
      - Compress then Compare: co InputFile CmvFile InputFile
      - Analyse then Compress then Expand: ace InputFile CmvFile OutputFile
      - Analyse then Compress then Compare: aco InputFile CmvFile InputFile
    - Added switch "-a" to set some parameters for the "a" command.
      The default is -a3,M1,0.
      From the help of cmv (some examples are in the Readme.txt file):
      a[<C>][<N>[,<N>[,<N>]]]: Analyse.
         [<C>][<N>]: Options. [<N>]: Size. [<N>]: How many best.
         Options:
         <C>:
            m: Enables maximum options also for the method (need 'x').
            x: Enables maximum options.
         <N>:
            Bit      0: Method and memory. 0: Disable. 1: Enable.
            Bit      1: Primary models. 0: Disable. 1: Enable.
            Bit      2: Secondary models, mixers, other. 0: Disable. 1: Enable.
            Bit      3: Loop until no improvement. 0: Disable. 1: Enable.
         Size: Analyse the first <N> bytes of the file. <N>: Size (0:full file size).
         How many best: How many partial best options retest with full size (0..16).
    - Added switch "-mx" to enable all models and set maximum memory (it's a shortcut of "-m2,3,>").
    - Added switch "-max" and "-amx" to enable maximum analysis, all models and set maximum memory (it's a shortcut of "-ax -mx").
    - Now the final compressed size also includes the header size (it's the full size file).
      Use -vf to display also the raw size.
    
    - Moved from g++ (GCC) 3.4.2 (mingw-special) to g++ (x86_64-win32-sjlj-rev0, Built by MinGW-W64 project) 5.2.0.
    - Added Help.txt, History.txt and Readme.txt.
    - Benchmarks.txt:
      - "Maximum Compression":
        - Added in "cmv" section the "-mx" column.
        - Added in "paq8pxd16 -s0 + cmv" section the "-m2,0,*" column.
        - Added in "paq8pxd16 -s0 + cmv" section the "Tarball" and "Tarball / Total" values.
        - Updated the "Best overall" column.
      - "Silesia Open Source Compression Benchmark":
        - Added in "cmv" section the "-mx" column.
        - Added in "cmv" section the "Tarball" and "Tarball / Total" values.
        - Updated the new best value for x-ray (from 3.571.523 "-m1,0,0x00a3619f" (*^0x034e9400) to 3.570.952 "-m1,3,0x00ab647f").
        - Updated the "Best overall" column.
      - "Lossless Photo Compression Benchmark":
        - Added the lacking values of the column "-m2,2,*".
      - "Specific case - High redundant data in a pattern":
        - Updated the new best value for "LOG.txt" (from 23.009 "-m2,0,0x01ec039f" (^0x02000008) to 19.897 "-m0,0,0x036c29fe").
        - Updated the new best value for "NUM.txt" (from 819 "-m2,0,0x00ec8bd9" (^0x0300884e) to 726 "-m1,2,0x01e1d9fa").
      - "Compression Competition -- $15,000 USD":
        - Updated the new best value for "SRR062634.filt.fastq" (from 14.958.423 "-m2,3,0x03ededff" (>&b12) to 14.897.825 "-m2,3,0x00a968fd").
      - "Calgary corpus (14 files)":
        - Added in "cmv" section the "-max" column.
      - "Canterbury corpus":
        - Added in "cmv" section the "-max" column.
      - Added "Squeeze Chart":
        - Added files "squeezechart_app.tar (tarball)", "squeezechart_gutenberg.tar (tarball)" and "squeezechart_installer.tar (tarball)".
        - Added "-mx" column.
      - Added "Wratislavia XML Corpus":
        - Added "-m0", "-m1", "-m2", "-mx" columns.
    
    0.1.0 - 2015/09/06
    - First public release.
    If you test cmv, take a look at Readme.txt for some more info.
    EDIT: The "a" command can take long time because it uses a brute force search.

    CMV-00.01.01.zip includes Benchmarks.txt.
    Attached Files Attached Files
    Last edited by Mauro Vezzosi; 13th January 2016 at 00:26. Reason: See "EDIT"

  5. Thanks:

    Skymmer (11th January 2016)

  6. #34
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    290
    Thanks
    120
    Thanked 169 Times in 125 Posts
    Blog
    Code:
    Next info 
    
    2017/08/06
    - Made 0.2.0 beta 1 standard and extreme versions.
    
    2017/08/06
    - Added the model "Previous seen bit", order 0 (history 1 and 2), 1 and 2.
      Always enabled only for methods >= 1.
    
    2017/07/29
    - Removed Bit 28 ((mixer2(mixerN_fast_lr(), mixerN_slow_lr()))) and bit 29 (mixerN SSE) from "*" methods options (they aren't models).
    
    2017/07/26
    Benchmarks for version 0.1.1, Squeeze Chart (PDF, JPG, MP3, PNG, Installer.. (Compressing Already Compressed Files)), result not verified.
        -m1,3,+     -m2,0,+     -m2,0,*
                                        Documents
      5.445.616   5.452.943   5.446.604 busch2.epub
     87.474.099  84.864.362             diato.sf2
     38.532.255  38.534.942             freecol.jar
     12.582.707  12.571.833             maxpc.pdf
     85.128.562                         SoniMusicae-Diato-sf2.zip
        791.766     791.372     791.405 squeeze.xlsx
                                        Image Formats (Camera Raw)
     22.065.306  22.043.852             canon.cr2
     11.325.973  11.109.503             fuji.raf
     33.826.108                         leica.dng
     32.532.433  32.532.436             nikon.nef
     15.862.814  15.859.052  15.856.336 oly.orf
     16.110.809                         pana.rw2
     11.947.899  11.978.628             sigma.x3f
     15.032.909  14.945.507  14.669.162 sony.arw
     20.438.149  20.288.840             sony2.arw
                                        Image Formats (web)
      1.782.552   1.781.843   1.773.735 filou.gif
      4.638.655   4.642.178   4.639.766 flumy.png
      6.828.506   6.834.776   6.828.513 mill.jpg
                                        Installers
    101.253.297 102.250.965             amd.run
    189.818.215 184.804.805             cab.tar
     54.115.084  54.089.566             inno.exe
     18.551.343  18.528.542  18.526.117 setup.msi
     53.701.756  54.112.501             wise.exe
                                        Interactive files
        340.557     340.066             flyer.msg
      5.900.341   5.898.822             swf.tar
                                        Scientific Data
         26.924      14.915       7.167 block.hex
    167.944.593 147.688.356             msg_lu.trace
    116.023.192 110.989.967             num_brain.trace
     36.433.304  34.696.885             obs_temp.trace
                                        Songs (Tracker Modules)
      6.645.290   6.501.083   6.456.087 it.it
     15.791.191  15.354.477             mpt.mptm
      7.428.792   7.286.070   7.180.942 xm.xm
                                        Songs (web)
     19.246.345  19.245.422             aac.aac
     16.508.192  16.520.159  16.507.079 diatonis.wma
    127.728.723 127.419.074             mp3corpus.tar
     36.334.963  36.332.287             ogg.ogg
                                        Videos (web)
     15.142.319  15.139.010  15.133.624 a55.flv
     32.153.051  32.151.082             h264.mkv
    162.890.183 162.883.854             star.mov
     96.324.466  96.480.593             van_helsing.ts
        -m1,3,+       -m2,3         -mx Squeeze Chart
     68.364.894  67.828.879  65.646.788 squeezechart_app.tar (tarball)
    
    2017/07/23
    - -m2,,- (-m2 with less options): changed Order 0..1 bit history (bit 24-25) from 0 to 1.
    
    2017/05/23
    Benchmarks for version in development (2017/05/23 >0.2.0a5, VOM model is disabled, mod_ppmd enabled), compared to the last official (0.1.1) and previous development (2016/06/08 0.2.0 ? and 2016/11/28 >0.2.0a3 (VOM model is disabled)) versions.
         0.1.1 -mx   0.2.0 ? -mx >0.2.0 a3 -mx >0.2.0 a5 -mx Maximum Compression
        2016/01/10    2016/06/08    2016/11/28    2017/05/23
           820.501       819.931       819.582       819.172 A10.jpg
         1.017.441       997.721       976.062       974.342 AcroRd32.exe
           400.343       381.471       375.731       371.535 english.dic
         3.574.241     3.566.950     3.555.244     3.554.780 FlashMX.pdf
           280.132       267.665       258.477       256.850 FP.LOG
         1.387.079     1.365.962     1.338.128     1.335.340 MSO97.DLL
           687.731       683.352       678.767       677.244 ohs.doc
           653.728       634.380       633.578       633.062 rafale.bmp
           399.793       391.522       386.441       385.870 vcfiu.hlp
           372.831       362.592       356.292       352.845 world95.txt
         9.593.820     9.471.546     9.378.302     9.361.040 Total
         9.632.713     9.515.604     9.386.444     9.366.438 MaxCompr.tar (very close to single file compression total :-))
                                       100.031       100.031 sharnd_challenge.dat
                                            34            34 Test_000
    0.1.1 -m2,3,0x03ededff    0.2.0 -mx   >0.2.0 a3 -mx    >0.2.0 a5 -mx Large Text Compression Benchmark
    2016/01/10     (>&b12)   2016/06/08      2016/11/28       2017/05/23
                        24           24              24               24 ENWIK0
                        31           32              32               32 ENWIK1
                        91           89              88               89 ENWIK2
                       299          287             284              286 ENWIK3
                     2.997        2.953           2.922            2.925 ENWIK4
                    25.568       25.248          25.076           24.998 ENWIK5
                   214.137      211.478         209.352          208.326 ENWIK6
                 1.968.717    1.941.513       1.915.141        1.900.749 ENWIK7
                18.153.319   17.898.994      17.650.885       17.325.679 ENWIK8
                                                              16.677.314 ENWIK8.drt (-m2,3,0x53e90df7)
    0.1.1 -m2,3,0x03ededff    0.2.0 -mx   >0.2.0 a3 -mx    >0.2.0 a5 -mx Other
    2016/01/10     (>&b12)   2016/06/08      2016/11/28       2017/05/23
                   194.417      192.013         191.081          190.221 book1
                   613.782      605.705         599.057          597.065 Calgary corpus.tar
                   617.996                      603.261          601.940 Calgary corpus.tar.paq8pxd16
                   331.931      327.901         325.392          324.517 Canterbury corpus.tar
                   333.145                      326.907          326.382 Canterbury corpus.tar.paq8pxd16
                                                       >0.2.0 a5 -m2,1,* Squeeze Chart - http://www.squeezechart.com
                                                              2017/05/23
                                                               6.684.067 MKVtoonix-GUI 10.0 64 Bit
                 0.1.1 -mx                                 >0.2.0 a5 -mx Wratislavia XML Corpus - http://pskibinski.pl/research/Wratislavia/
                2016/01/10                                    2017/05/23
                 1.103.033                                     1.074.470 shakespeare.xml
                    56.836                                        54.461 uwm.xml
    0.1.1 -m2,3,0x03ededff                >0.2.0 a3 -mx    >0.2.0 a5 -mx 10 GB Compression Benchmark
    2016/01/10     (>&b12)                   2016/11/28       2017/05/23
                33.024.880                   33.072.180       32.621.364 100mb.tar (100mb subset) (tarball)
    0.1.1 -m2,3,0x03ededff    0.2.0 -mx   >0.2.0 a3 -mx    >0.2.0 a5 -mx Compression Competition -- $15,000 USD
    2016/01/10     (>&b12)   2016/06/08      2016/11/01       2017/05/23
                14.897.825   14.782.006      14.737.701       14.710.428 SRR062634.filt.fastq
    0.1.1 -m2,3,0x03ededff    0.2.0 -mx   >0.2.0 a3 -mx    >0.2.0 a5 -mx Specific case - High redundant data in a pattern
    2016/01/10     (>&b12)   2016/06/08      2016/11/28       2017/05/23
                     1.242        1.282           1.200            1.235 NUM.txt
    0.1.1 -m2,3,0x03ededff   PPMonstr J   >0.2.0 a3 -mx    >0.2.0 a5 -mx Generic Compression Benchmark
    2016/01/10     (>&b12)   2006/02/16      2016/11/28       2017/05/23
                 2.923.969    3.003.153       2.865.327        2.858.429 uiq2-seed'0'-n1000000
                0,97363305   1,00000000      0,95410623       0,95180932 Size (Ratio)
    Prime Number Benchmark
       msb     lsb     dbl     hex    text   twice interlv   delta bytemap  bitmap    total  tarball tarball/total (total = 4.803.388; tarball = 4.812.800, 7z.exe a -ttar)
    39.547  39.314  40.403  41.851  39.019  39.569  40.831  34.251  25.789  32.110  372.684  292.304 0,78432130169 cmv -mx  0.1.1
    39.379  39.181  40.011  41.629  38.829  39.400  40.548  33.999  23.851  31.659  368.486  290.754 0,78905033027 cmv -max 0.1.1
    38.233  37.987  38.897  41.233  38.301  38.261  39.570  33.944  24.149  31.909  362.484  284.835 0,78578640712 cmv -mx  0.2.0 (2016/06/08)
    37.963  37.679  38.711  40.893  38.131  37.995  39.136  33.925  22.864  31.895  359.192  281.555 0,78385654469 cmv -mx  >0.2.0 a3 (2016/11/28)
    37.834  37.510  38.600  40.737  38.016  37.863  39.133  33.856  22.880  31.850  358.279  280.948 0,78415983075 cmv -mx  >0.2.0 a5 -mx (2017/05/23)
    37.835  37.973  38.597  41.619  38.829  37.894  36.255  33.726  23.851  29.527  356.106                        Best overall (2016/01/14)
        nz      nz      nz    cmix     cmv      nz      nz    cmix     cmv    cmix
     0.1.1 -mx   0.1.1 Optimal   0.2.0a3 ICM1 Opt.   0.2.0a3 Extreme Opt.   >0.2.0 a3 -mx   >0.2.0 a5 -mx   >0.2.0 a5Opt. Darek's testbed
    2016/01/10      2016/01/10          2016/09/27             2016/09/27      2016/11/01      2017/05/23      2017/05/23
     1.404.493       1.399.464           1.381.660              1.380.299       1.386.222       1.384.535       1.380.937 0.WAV
       323.277         322.990             303.175                301.866         303.864         303.490         302.942 1.BMP
       835.277         834.675             744.125                734.102         745.829         744.815         743.806 A.TIF
       780.311         779.596             692.949                683.189         693.092         692.099         690.734 B.TGA
       327.568         325.838             318.478                318.919         321.261         321.066         318.102 C.TIF
       310.952         310.480             303.235                303.056         303.979         303.659         302.614 D.TGA
       497.089         496.837             496.142                494.695         496.487         496.368         496.055 E.TIF
       110.914         110.799             110.755                110.571         110.809         110.798         110.744 F.JPG
     1.367.023       1.366.851           1.358.405              1.356.314       1.357.977       1.356.525       1.356.251 G.EXE
       482.720         482.701             462.836                463.150         461.994         460.785         460.361 H.EXE
       227.898         227.791             217.797                217.981         217.765         217.459         217.096 I.EXE
        43.364          43.325              43.172                 43.179          43.173          43.178          43.151 J.EXE
     2.618.200       2.616.775           2.540.011              2.536.718       2.534.630       2.531.528       2.532.372 K.WAD
     2.830.322       2.829.223           2.751.490              2.745.121       2.751.856       2.748.195       2.746.108 L.PAK
        55.438          55.027              52.420                 51.990          52.617          52.376          51.634 M.DBF
        86.689          86.688              83.967                 83.837          83.811          83.553          83.457 N.ADX
         3.777           3.775               3.672                  3.668           3.670           3.678           3.669 O.APR
           947             938                 878                    879             912             909             880 P.FM3
       187.547         187.547             170.171                168.828         169.843         169.048         168.401 Q.WK3
        29.327          29.245              28.781                 28.768          28.782          28.725          28.632 R.DOC
        26.020          25.970              25.447                 25.435          25.418          25.359          25.261 S.DOC
        18.752          18.727              18.367                 18.367          18.285          18.274          18.232 T.DOC
         8.681           8.646               8.536                  8.532           8.530           8.535           8.506 U.DOC
        18.570          18.519              18.213                 18.206          18.198          18.165          18.072 V.DOC
        13.228          13.185              13.007                 13.002          12.995          12.988          12.930 W.DOC
        11.049          10.997              10.854                 10.857          10.863          10.848          10.792 X.DOC
           323             320                 314                    314             318             318             315 Y.CFG
           176             171                 166                    166             169             171             166 Z.MSG
    12.619.932      12.607.100          12.159.023             12.122.009      12.163.349      12.147.447      12.132.220 Total
                                                                               12.166.965      12.152.547                 Testbed.tar (quite close to single file compression total :-))
    0.1.1 -m2,3,0x03ededff   >0.2.0 a5 -mx Other 2
    2016/01/10     (>&b12)      2017/05/23
                   175.282         160.565 AIMP_free.tga
                 5.265.328       5.215.493 FFADMIN.EXE
                    46.949          43.339 _FOSSIL_
     0.1.1 -mx   >0.2.0 a5 -m2,0,+|0x18000000   >0.2.0 a5 -m2,0,* Squeeze Chart (Txt Bible (Compressing Text In Different Languages))
    2016/01/10                     2017/05/23          2017/05/23
       693.479                        679.334             671.851 afri.txt
       746.900                        729.857             722.773 alb.txt
       624.237                        612.327             608.960 ara.txt
       705.168                        690.952             687.040 chi.txt
       922.105                        903.396             891.634 cro.txt
       749.296                        733.615             728.593 cze.txt
       695.223                        681.642             675.363 dan.txt
       729.682                        715.570             707.159 dut.txt
       659.272                        646.581             638.960 eng.txt
       649.197                        637.874             629.684 esp.txt
       723.353                        709.038             703.339 fin.txt
       692.336                        677.031             669.252 fre.txt
       717.826                        702.885             696.772 ger.txt
       221.765                        218.525             212.385 gre.txt
       625.971                        617.692             605.519 heb.txt
       782.307                        764.490             760.277 hun.txt
       729.490                        715.866             707.533 ita.txt
       637.865                        623.902             621.463 kor.txt
       816.845                        800.804             791.440 lat.txt
       676.332                        660.210             653.353 lit.txt
       678.285                        666.325             656.289 mao.txt
       694.682                        681.233             674.540 nor.txt
       712.650                        699.289             693.137 por.txt
       712.598                        697.016             690.182 rom.txt
       744.556                        725.742             721.408 rus.txt
       708.648                        695.153             687.979 spa.txt
       758.391                        740.013             731.808 swe.txt
       687.045                        671.278             663.161 tag.txt
       778.152                        750.736             748.526 thai.txt
       662.605                        645.354             640.555 turk.txt
       712.785                        696.119             692.605 vie.txt
       715.801                        699.326             692.804 xho.txt
    22.364.847                     21.889.175          21.676.344 Total
    
    2017/??/??
    - Added Shelwien's mod_ppmd in already existent PPM low orders model (bit 27).
      Method: 0: Order 4. 1: Order 6. 2: Order 16.
      Memory:        -m,0   -m,1   -m,2   -m,3.
           -m0: Mb      8     16     32     64.
           -m1: Mb     32     64    128    256.
           -m2: Mb    128    256    512   1024.
      (Memory = MB(8 << (method * 2 + memory)))
    - Changed ICM (indirect context models): from 5+5 counters + 3 bit history and "full" mixer to 6+6 counters + 2 bit history and quantized mixer.
    - Extreme version: added SSE in the mixer of 2 models (main model (mixallmixmixp3[]) and Gap model 2) when bit 28 and 29 are switched on.
    
    2017/01/17
    - Made 0.2.0 alpha 5 standard and extreme versions.
    
    2017/01/??
    - Extreme version: Final N input mixer (bit 20-21, option 2): chained the predictions of the second layer of the mixers tree 15->3->1.
    
    2017/01/05
    - Extreme version: Final N input mixer (bit 20-21, option 2): 20 bits precision was wrong (see 2016/12/29), decreased to 19 bits.
    
    2016/12/31
    - Extreme version: Final N input mixer (bit 20-21, option 2): changed the mixers tree from 6->3->1 to 15->3->1 (experimental modify, maybe can be changed or deleted in future).
    
    2016/12/29
    - Extreme version: Final N input mixer (bit 20-21, option 2): more precision in mixer2 (from 16 to 20 bits precision) (mixer2(mixerN_fast_lr(), mixerN_slow_lr())) (very small gain).
    
    2016/12/??
    - I made some attempts to improved the mixer, but I failed except in weight initialization (very small gain).
    - Final N input mixer (bit 20-21, option 2): changed (fixed) initialization of the second level mixer (now it seems to be little bit worse).
    
    2016/11/28
    Benchmarks for version in development (2016/11/28 >0.2.0a3, VOM model is disabled), compared to the last official (0.1.1) and previous development (2016/06/08 0.2.0 ?) versions.
                 0.1.1 -mx  0.2.0 ? -mx   >0.2.0 a3 -mx Maximum Compression
                2016/01/10   2016/06/08      2016/11/28
                   820.501      819.931         819.582 A10.jpg
                 1.017.441      997.721         976.062 AcroRd32.exe
                   400.343      381.471         375.731 english.dic
                 3.574.241    3.566.950       3.555.244 FlashMX.pdf
                   280.132      267.665         258.477 FP.LOG
                 1.387.079    1.365.962       1.338.128 MSO97.DLL
                   687.731      683.352         678.767 ohs.doc
                   653.728      634.380         633.578 rafale.bmp
                   399.793      391.522         386.441 vcfiu.hlp
                   372.831      362.592         356.292 world95.txt
                 9.593.820    9.471.546       9.378.302 Total
                 9.632.713    9.515.604       9.386.444 MaxCompr.tar (very close to single file compression total :-))
                                                100.031 sharnd_challenge.dat
                                                     34 Test_000
     0.1.1 -mx   0.2.0 ? -mx         0.2.0a2      0.2.0a3   >0.2.0 a3 -mx Silesia Open Source Compression Benchmark
    2016/01/10    2016/06/08   2016/07/02-06   2016/09-10      2016/11/28
     2.047.734     2.016.906       2.015.523    2.003.326       1.999.489 dickens
    10.117.960     9.958.936       9.942.493    9.812.955       9.800.725 mozilla
     2.065.773     2.004.632       2.003.968    2.002.364       2.003.809 mr
       969.798       932.177         931.890      925.225         920.860 nci
     1.698.391     1.662.164       1.658.048    1.625.486       1.622.001 ooffice
     2.081.744     2.052.039       2.052.089    2.042.336       2.042.487 osdb
       830.294       813.002         812.739      804.606         799.896 reymont
     2.799.600     2.764.836       2.762.207    2.740.579       2.725.908 samba
     3.807.684     3.776.032       3.775.953    3.764.244       3.764.346 sao
     5.292.616     5.184.527       5.177.512    5.126.226       5.080.963 webster
     3.577.455     3.556.659       3.555.720    3.554.770       3.556.252 x-ray
       281.484       276.571         276.435      275.044         272.247 xml
    35.570.533    34.998.481      34.964.577   34.677.161      34.588.983 Total
    0.1.1 -m2,3,0x03ededff    0.2.0 -mx   >0.2.0 a3 -mx Large Text Compression Benchmark
    2016/01/10     (>&b12)   2016/06/08      2016/11/28
                        24           24              24 ENWIK0
                        31           32              32 ENWIK1
                        91           89              88 ENWIK2
                       299          287             284 ENWIK3
                     2.997        2.953           2.922 ENWIK4
                    25.568       25.248          25.076 ENWIK5
                   214.137      211.478         209.352 ENWIK6
                 1.968.717    1.941.513       1.915.141 ENWIK7
                18.153.319   17.898.994      17.650.885 ENWIK8
    0.1.1 -m2,3,0x03ededff    0.2.0 -mx   >0.2.0 a3 -mx Other
    2016/01/10     (>&b12)   2016/06/08      2016/11/28
                   194.417      192.013         191.081 book1
                   613.782      605.705         599.057 Calgary corpus.tar
                   617.996                      603.261 Calgary corpus.tar.paq8pxd16
                   331.931      327.901         325.392 Canterbury corpus.tar
                   333.145                      326.907 Canterbury corpus.tar.paq8pxd16
    0.1.1 -m2,3,0x03ededff   >0.2.0 a3 -mx 10 GB Compression Benchmark
    2016/01/10     (>&b12)      2016/11/28
                33.024.880      33.072.180 100mb.tar (100mb subset) (tarball) (worse than 0.1.1 :-()
    0.1.1 -m2,3,0x03ededff    0.2.0 -mx   >0.2.0 a3 -mx Compression Competition -- $15,000 USD
    2016/01/10     (>&b12)   2016/06/08      2016/11/01
                14.897.825   14.782.006      14.737.701 SRR062634.filt.fastq
    0.1.1 -m2,3,0x03ededff    0.2.0 -mx   >0.2.0 a3 -mx Specific case - High redundant data in a pattern
    2016/01/10     (>&b12)   2016/06/08      2016/11/28
                     1.242        1.282           1.200 NUM.txt
    0.1.1 -m2,3,0x03ededff   >0.2.0 a3 -mx Testing compressors with artificial data
    2016/01/10     (>&b12)      2016/11/28
                 1.000.048       1.000.070 a6000
                   875.600         875.617 a6001
                   500.888         501.254 a6004
                   126.131         126.717 a6007
                        71              95 b6002
                        71              84 b6004
                        73              75 b6010
                       148              93 b6100
                        23              23 c0000
                        68              67 c6000
                        74             118 c6001
                       101              67 c6255
                       200             162 d6001
                        80             166 d6016
                        71             105 d6128
                       179             204 i6002
                       181             212 i6004
                       265             190 i6010
                       488             421 i6100
                       247             175 l6002
                       214             212 l6003
                       211             200 l6004
                       229             192 l6008
                       387             260 m6002
                       203             240 m6003
                       339             296 m6004
                       386             383 m6008
                   500.830         500.645 p6002
                   501.924         500.545 p6004
                   503.587         500.614 p6010
                   515.730         511.624 p6100
                   500.575         500.743 r6002
                   250.986         250.548 r6004
                   101.881         100.584 r6010
                    13.215          11.642 r6100
                   500.618         500.860 s6002
                   250.616         250.640 s6004
                   101.177         100.676 s6010
                    11.404          10.625 s6100
                     5.043           5.037 w4002
                    50.077          50.048 w5002
                   500.185         500.114 w6002
                   100.296         100.121 w6010
                    10.235          10.123 w6100
                 5.000.429       5.000.739 w7002
                50.076.893      96.493.563 w8002
                62.002.677     108.407.189 Total
    0.1.1 -m2,3,0x03ededff   PPMonstr J   >0.2.0 a3 -mx Generic Compression Benchmark
    2016/01/10     (>&b12)   2006/02/16      2016/11/28
                 2.923.969    3.003.153       2.865.327 uiq2-seed'0'-n1000000
                 2.924.115    3.002.768       2.865.090 uiq2-seed'1'-n1000000
                 2.924.322    3.003.251       2.865.234 uiq2-seed'2'-n1000000
                 2.925.567    3.004.457       2.866.751 uiq2-seed'3'-n1000000
                 2.922.792    3.001.321       2.864.111 uiq2-seed'4'-n1000000
                 2.924.509    3.003.981       2.865.290 uiq2-seed'5'-n1000000
                 2.924.659    3.003.513       2.865.911 uiq2-seed'6'-n1000000
                 2.924.616    3.003.203       2.865.552 uiq2-seed'7'-n1000000
                 2.925.579    3.004.358       2.867.011 uiq2-seed'8'-n1000000
                 2.926.293    3.004.433       2.867.016 uiq2-seed'9'-n1000000
                29.246.421   30.034.438      28.657.293 Total
                0,97376289   1,00000000      0,95414780 Size (Ratio)
    Prime Number Benchmark
       msb     lsb     dbl     hex    text   twice interlv   delta bytemap  bitmap    total  tarball tarball/total (total = 4.803.388; tarball = 4.812.800, 7z.exe a -ttar)
    39.547  39.314  40.403  41.851  39.019  39.569  40.831  34.251  25.789  32.110  372.684  292.304 0,78432130169 cmv -mx  0.1.1
    39.379  39.181  40.011  41.629  38.829  39.400  40.548  33.999  23.851  31.659  368.486  290.754 0,78905033027 cmv -max 0.1.1
    38.233  37.987  38.897  41.233  38.301  38.261  39.570  33.944  24.149  31.909  362.484  284.835 0,78578640712 cmv -mx  0.2.0 (2016/06/08)
    37.963  37.679  38.711  40.893  38.131  37.995  39.136  33.925  22.864  31.895  359.192  281.555 0,78385654469 cmv -mx  >0.2.0 a3 (2016/11/28)
    37.835  37.973  38.597  41.619  38.829  37.894  36.255  33.726  23.851  29.527  356.106                        Best overall (2016/01/14)
        nz      nz      nz    cmix     cmv      nz      nz    cmix     cmv    cmix
     0.1.1 -mx   0.1.1 Optimal   0.2.0a3 ICM1 Opt.   0.2.0a3 Extreme Opt.   >0.2.0 a3 -mx Darek's testbed
    2016/01/10      2016/01/10          2016/09/27             2016/09/27      2016/11/01
     1.404.493       1.399.464           1.381.660              1.380.299       1.386.222 0.WAV
       323.277         322.990             303.175                301.866         303.864 1.BMP
       835.277         834.675             744.125                734.102         745.829 A.TIF
       780.311         779.596             692.949                683.189         693.092 B.TGA
       327.568         325.838             318.478                318.919         321.261 C.TIF
       310.952         310.480             303.235                303.056         303.979 D.TGA
       497.089         496.837             496.142                494.695         496.487 E.TIF
       110.914         110.799             110.755                110.571         110.809 F.JPG
     1.367.023       1.366.851           1.358.405              1.356.314       1.357.977 G.EXE
       482.720         482.701             462.836                463.150         461.994 H.EXE
       227.898         227.791             217.797                217.981         217.765 I.EXE
        43.364          43.325              43.172                 43.179          43.173 J.EXE
     2.618.200       2.616.775           2.540.011              2.536.718       2.534.630 K.WAD
     2.830.322       2.829.223           2.751.490              2.745.121       2.751.856 L.PAK
        55.438          55.027              52.420                 51.990          52.617 M.DBF
        86.689          86.688              83.967                 83.837          83.811 N.ADX
         3.777           3.775               3.672                  3.668           3.670 O.APR
           947             938                 878                    879             912 P.FM3
       187.547         187.547             170.171                168.828         169.843 Q.WK3
        29.327          29.245              28.781                 28.768          28.782 R.DOC
        26.020          25.970              25.447                 25.435          25.418 S.DOC
        18.752          18.727              18.367                 18.367          18.285 T.DOC
         8.681           8.646               8.536                  8.532           8.530 U.DOC
        18.570          18.519              18.213                 18.206          18.198 V.DOC
        13.228          13.185              13.007                 13.002          12.995 W.DOC
        11.049          10.997              10.854                 10.857          10.863 X.DOC
           323             320                 314                    314             318 Y.CFG
           176             171                 166                    166             169 Z.MSG
    12.619.932      12.607.100          12.159.023             12.122.009      12.163.349 Total
                                                                               12.166.965 Testbed.tar (very close to single file compression total :-))
    
    2016/11/??
    - Word model (bit 9-10):
      - Improved non-word models.
      - Added new model to option 1: distance from nth character after a LF, 30 predictors/characters.
      - Added new model to option 1: vowel/consonant/other, order 4 and 7, 3 * 2 = 6 predictors.
    
    2016/11/04
    - ICM: better hash calculation --> bit better precision in predictions.
    - Word model (bit 9-10): now all predictors use new ICM --> bit better predictions.
    - Extreme version: added new 8 predictors in the delta model.
    Benchmarks for version in development (2016/11/01 >0.2.0a3, VOM model is disabled), compared to the last official (0.1.1) and previous development (2016/06/08) versions.
                 0.1.1 -mx    0.2.0 -mx   >0.2.0.a3 -mx Maximum Compression
                2016/01/10   2016/06/08      2016/11/01
                   820.501      819.931         819.620 A10.jpg
                 1.017.441      997.721         977.181 AcroRd32.exe
                   400.343      381.471         376.271 english.dic
                 3.574.241    3.566.950       3.557.998 FlashMX.pdf
                   280.132      267.665         262.734 FP.LOG
                 1.387.079    1.365.962       1.339.122 MSO97.DLL
                   687.731      683.352         679.068 ohs.doc
                   653.728      634.380         633.386 rafale.bmp
                   399.793      391.522         387.262 vcfiu.hlp
                   372.831      362.592         358.775 world95.txt
                 9.593.820    9.471.546       9.391.417 Total
                 9.632.713    9.515.604       9.400.385 MaxCompr.tar (very close to single file compression total :-))
                                                100.031 sharnd_challenge.dat.cmv
                                                     34 Test_000.cmv
    0.1.1 -m2,3,0x03ededff    0.2.0 -mx   >0.2.0.a3 -mx Large Text Compression Benchmark
    2016/01/10     (>&b12)   2016/06/08      2016/11/01
                        24           24              24 ENWIK0
                        31           32              32 ENWIK1
                        91           89              88 ENWIK2
                       299          287             285 ENWIK3
                     2.997        2.953           2.937 ENWIK4
                    25.568       25.248          25.174 ENWIK5
                   214.137      211.478         210.225 ENWIK6
                 1.968.717    1.941.513       1.921.495 ENWIK7
                18.153.319   17.898.994      17.692.364 ENWIK8
    0.1.1 -m2,3,0x03ededff    0.2.0 -mx   >0.2.0.a3 -mx Other
    2016/01/10     (>&b12)   2016/06/08      2016/11/01
                   194.417      192.013         191.470 book1
                   613.782      605.705         600.980 Calgary corpus.tar
                   617.996                      604.811 Calgary corpus.tar.paq8pxd16
                   331.931      327.901         326.053 Canterbury corpus.tar
                   333.145                      327.473 Canterbury corpus.tar.paq8pxd16
    0.1.1 -m2,3,0x03ededff    0.2.0 -mx   >0.2.0.a3 -mx Compression Competition -- $15,000 USD
    2016/01/10     (>&b12)   2016/06/08      2016/11/01
                14.897.825   14.782.006      14.757.363 SRR062634.filt.fastq
    0.1.1 -m2,3,0x03ededff    0.2.0 -mx   >0.2.0.a3 -mx Specific case - High redundant data in a pattern
    2016/01/10     (>&b12)   2016/06/08      2016/11/01
                     1.242        1.282           1.243 NUM.txt
    Prime Number Benchmark
       msb     lsb     dbl     hex    text   twice interlv   delta bytemap  bitmap    total  tarball tarball/total (total = 4.803.388; tarball = 4.812.800, 7z.exe a -ttar)
    39.547  39.314  40.403  41.851  39.019  39.569  40.831  34.251  25.789  32.110  372.684  292.304 0,78432130169 cmv -mx  0.1.1
    39.379  39.181  40.011  41.629  38.829  39.400  40.548  33.999  23.851  31.659  368.486  290.754 0,78905033027 cmv -max 0.1.1
    38.233  37.987  38.897  41.233  38.301  38.261  39.570  33.944  24.149  31.909  362.484  284.835 0,78578640712 cmv -mx  0.2.0 (2016/06/08)
    38.037  37.772  38.759  40.902  38.133  38.068  39.301  33.914  22.814  31.880  359.580  281.752 0,78355859614 cmv -mx  >0.2.0a3 (2016/11/01)
    37.835  37.973  38.597  41.619  38.829  37.894  36.255  33.726  23.851  29.527  356.106                        Best overall (2016/01/14)
        nz      nz      nz    cmix     cmv      nz      nz    cmix     cmv    cmix
    
    
    2016/10/??
    - Start of maintaining also the CMV Extreme version.
      It's CMV64 with 4x memory and some improvement I made during the development of CMV, but they are disabled because they improve only slightly the compression ratio and I don't want to take so much time or memory for very small gain in the standard version.
      - N imput mixer: works with 32 bits instead of 16 bits. More precision most of the time, 2x memory.
      - New ICM: context hash bucket works with 64 contexts instead of 16. More precision most of the time, it can takes up to 4x time to search data in the bucket, same memory.
      - Gap model 2 (bit 14): it works with 36 (6x6) predictors instead of 16 (4x4), added order 4. ~~2x time, +2x memory.
      - PPM (bit 27): added order 2. More time, more memory.
      - Last bits: new silly model always enabled, it has 15 predictors, but gain is very small. More time, more memory.
      - Extreme may need of 17 Gb RAM (I don't know if it's the maximum) and it is more than 10% slower.
    
    2016/09/29
    Benchmarks for version in development (2016/09/28), VOM model is disabled.
    0.2.0a3 -mx Maximum Compression
        819.594 A10.jpg
        978.102 AcroRd32.exe
        377.493 english.dic
      3.558.708 FlashMX.pdf
        265.594 FP.LOG
      1.340.401 MSO97.DLL
        679.124 ohs.doc
        633.388 rafale.bmp
        387.773 vcfiu.hlp
        359.548 world95.txt
      9.399.725 Total
        100.031 sharnd_challenge.dat
             34 Test_000
    
    2016/09/??
    Added Exe model (bit 30).
    
    2016/08..09/??
    Added new style ICM: it has more precision and reduces the collision of the context hash but handles half number of contexts.
    At the moment I changed the main Order-N, the Word (bits 9-10), More sparse and masked models (bits 15-16) and Exe models (bit 30) to use the new ICM.
    New ICM applied to Gap model 1 (bit 13) and Delta model (bit 26) hurts compression.
    
    2016/08..09/??
    - Sparse match model (bits 1-3): small improvement for option 7, to be continued.
    - SSE (bit 29): very slightly improved.
    
    ~2016/07/??
    The maximum order that Variable order and memory model (VOM) handles is reduced from 10 (the help was wrong, it says 9) to 8.
    This should improve a little bit the compression ratio.
    
    2016/07/02
    I will be on holiday from 2016/07/09 to 2016/07/24, probably I won't read encode.su in those days.
    
    2016/06/25
    The first attempt to improve speed or compression ratio of ICM is failed.
    Now I'm adding a byte-level PPM order 0, order 1 gap 0/1/2/3.
    
    2016/06/10-2016/06/19
    Benchmarks for version in development (2016/06/08 (2)) compared to the last official and development (2016/02/01 (1)) versions.
    0.2.0 version (dev.(2)) will haven't "Variable order and memory model" (VOM) enabled by default in -mx (to save time and memory, sometimes it hurts compression ratio otherwise it saves only few bytes), you can add it with "-m2,3,>|b12".
     0.1.1 -mx dev.(1) -mx dev.(2) -mx Maximum Compression
       820.501     820.140     819.931 A10.jpg
     1.017.441     998.416     997.721 AcroRd32.exe
       400.343     383.246     381.471 english.dic
     3.574.241   3.567.537   3.566.950 FlashMX.pdf
       280.132     266.983     267.665 FP.LOG
     1.387.079   1.366.681   1.365.962 MSO97.DLL
       687.731     682.960     683.352 ohs.doc
       653.728     635.473     634.380 rafale.bmp
       399.793     392.447     391.522 vcfiu.hlp
       372.831     366.705     362.592 world95.txt
     9.593.820   9.480.588   9.471.546 Total
     9.632.713   9.524.816   9.515.604 MaxCompr.tar
     0.1.1 -mx dev.(1) -mx dev.(2) -mx Silesia Open Source Compression Benchmark (SOSCB)
     2.047.734   2.023.802   2.016.906 dickens
    10.117.960   9.966.096   9.958.936 mozilla
     2.065.773   2.012.206   2.004.632 mr      (SOSCB best)
       969.798     933.615     932.177 nci
     1.698.391   1.663.621   1.662.164 ooffice
     2.081.744   2.049.413   2.052.039 osdb
       830.294     815.091     813.002 reymont
     2.799.600   2.773.724   2.764.836 samba
     3.807.684   3.775.695   3.776.032 sao
     5.292.616   5.201.770   5.184.527 webster
     3.577.455   3.561.918   3.556.659 x-ray   (SOSCB best)
       281.484     277.915     276.571 xml
    35.570.533  35.054.866  34.998.481 Total
    0.1.1 -m2,3,0x03ededff dev.(1) -mx dev.(2) -mx Large Text Compression Benchmark
                   (>&b12)
                        24          24          24 ENWIK0
                        31          31          32 ENWIK1
                        91          88          89 ENWIK2
                       299         285         287 ENWIK3
                     2.997       2.947       2.953 ENWIK4
                    25.568      25.295      25.248 ENWIK5
                   214.137     211.863     211.478 ENWIK6
                 1.968.717   1.946.606   1.941.513 ENWIK7
                18.153.319  17.928.046  17.898.994 ENWIK8
     0.1.1 -mx dev.(1) -mx dev.(2) -mx Other
       194.417     192.249     192.013 book1
       613.782     606.341     605.705 Calgary corpus.tar
       331.931     327.840     327.901 Canterbury corpus.tar
     0.1.1 -mx dev.(1) -mx dev.(2) -mx Compression Competition -- $15,000 USD
    14.897.825  14.769.089  14.782.006 SRR062634.filt.fastq
     0.1.1 -mx dev.(1) -mx dev.(2) -mx 5 public files from Darek's testbed
    1.404.493    1.391.704   1.389.245 0.WAV
      310.952      306.922     306.998 D.TGA
      482.720      472.887     472.193 H.EXE
      187.547      179.161     179.542 Q.WK3
       29.327       28.987      28.951 R.DOC
    0.1.1 -m2,3,0x03ededff dev.(1) -mx Specific case - High redundant data in a pattern
                   (>&b12)
                     1.242       1.282 NUM.txt (I don't like this result)
    Prime Number Benchmark
       msb     lsb     dbl     hex    text   twice interlv   delta bytemap  bitmap    total  tarball tarball/total (total = 4.803.388; tarball = 4.812.800, 7z.exe a -ttar)
    39.547  39.314  40.403  41.851  39.019  39.569  40.831  34.251  25.789  32.110  372.684  292.304 0,78432130169 cmv -mx  0.1.1
    38.233  37.987  38.897  41.233  38.301  38.261  39.570  33.944  24.149  31.909  362.484  284.835 0,78578640712 cmv -mx  dev. (2)
    39.379  39.181  40.011  41.629  38.829  39.400  40.548  33.999  23.851  31.659  368.486  290.754 0,78905033027 cmv -max 0.1.1
    37.835  37.973  38.597  41.619  38.829  37.894  36.255  33.726  23.851  29.527  356.106                        Best overall (2016/01/14)
        nz      nz      nz    cmix     cmv      nz      nz    cmix     cmv    cmix     
    
    2016/06/05
    I finished to work on the word model:
    - Changed from DCM to ICM.
    - Now it handles 4x contexts.
    - Current word contexts: wordbuf(0), wordbuf(0) + wordbuf(1), wordbuf(0) + wordbuf(2), wordbuf(0) + wordbuf(1) + wordbuf(2) + wordbuf(3).
    - Now word and non-word models always returns a prediction.
    - Deleted 3 secondary mixed predictions.
    I changed some hash evaluations to improve a little bit the predictions and the speed of some models.
    
    2016/05/21 (2016/09/20 Added sony2.arw)
    Cmv 0.1.1, option -m2,0,+, Squeeze Chart (PDF, JPG, MP3, PNG, Installer.. (Compressing Already Compressed Files)), result not verified.
                Documents
      5.452.943 busch2.epub
     84.864.362 diato.sf2
     38.534.942 freecol.jar
     12.571.833 maxpc.pdf
        791.372 squeeze.xlsx
                Image Formats (Camera Raw)
     22.043.852 canon.cr2
     11.109.503 fuji.raf
     32.532.436 nikon.nef
     15.859.052 oly.orf
     11.978.628 sigma.x3f
     14.945.507 sony.arw
     20.288.840 sony2.arw
                Image Formats (web)
      1.781.843 filou.gif
      4.642.178 flumy.png
      6.834.776 mill.jpg
                Installers
    102.250.965 amd.run
    184.804.805 cab.tar
     54.089.566 inno.exe
     18.528.542 setup.msi
     54.112.501 wise.exe
                Interactive files
        340.066 flyer.msg
      5.898.822 swf.tar
                Scientific Data
         14.915 block.hex
    147.688.356 msg_lu.trace
    110.989.967 num_brain.trace
     34.696.885 obs_temp.trace
                Songs (Tracker Modules)
      6.501.083 it.it
     15.354.477 mpt.mptm
      7.286.070 xm.xm
                Songs (web)
     19.245.422 aac.aac
     16.520.159 diatonis.wma
    127.419.074 mp3corpus.tar
     36.332.287 ogg.ogg
                Videos (web)
     15.139.010 a55.flv
     32.151.082 h264.mkv
    162.883.854 star.mov
     96.480.593 van_helsing.ts
    
    2016/05/20
    Cmv 0.1.1, option -mx, Squeeze Chart (Txt Bible (Compressing Text In Different Languages)), result not verified.
       693.479 afri.txt
       746.900 alb.txt
       624.237 ara.txt
       705.168 chi.txt
       922.105 cro.txt
       749.296 cze.txt
       695.223 dan.txt
       729.682 dut.txt
       659.272 eng.txt
       649.197 esp.txt
       723.353 fin.txt
       692.336 fre.txt
       717.826 ger.txt
       221.765 gre.txt
       625.971 heb.txt
       782.307 hun.txt
       729.490 ita.txt
       637.865 kor.txt
       816.845 lat.txt
       676.332 lit.txt
       678.285 mao.txt
       694.682 nor.txt
       712.650 por.txt
       712.598 rom.txt
       744.556 rus.txt
       708.648 spa.txt
       758.391 swe.txt
       687.045 tag.txt
       778.152 thai.txt
       662.605 turk.txt
       712.785 vie.txt
       715.801 xho.txt
    22.364.847 Total
    
    2016/05/07
    Cmv 0.1.1, vm.dll 1 2, result not verified.
    820.313.136 -m2,3
    
    2016/04/23
    I'm working on the mixer and word model.
    Current tests on SqueezeChart "Compressing Already Compressed Files" are very nice.
    
    2016/04/08
    Cmv 0.1.1, ENWIK9.DRT, result not verified.
    140.808.418 -m2,3,0x03ededff
    
    2016/04/01 (2016/04/08 Added R.DOC)
    Benchmarks for version in development (2016/02/01) compared to the last official version.
    0.1.1 -mx  develop.-mx  5 public files from Darek's testbed
    1.404.493    1.391.704  0.WAV
      310.952      306.922  D.TGA
      482.720      472.887  H.EXE
      187.547      179.161  Q.WK3
       29.327       28.987  R.DOC
    
    2016/03/26
    Cmv 0.1.1, test file 2547.gif posted in paq8px, results not verified.
    126.907.065 -m0
    125.064.783 -m1
    125.045.518 -m2
    124.811.418 -m2,0,+
     91.144.579 Precomp v0.4.4 -cn | cmv -m0
     77.591.645 Precomp v0.4.4 -cn | cmv -m1
     72.082.466 Precomp v0.4.4 -cn | cmv -mx
    
    2016/03/26
    Thread pool needs a procedure rather long to be useful.
    I don't implement it for the moment.
    
    2016/02/20-27
    Cmv 0.1.1, test files posted in Text strings coding chemical structures.
           -m0    ratio     bpb        -mx    ratio     bpb       Original
            19  0,00000  0,0000         23  0,00000   0,0000             0 01+crlf-duplicity.smi
            21  7,00000 56,0000         25  8,33333  66,6667             3 01+crlf-sorted-duplicity_included.smi
            21  7,00000 56,0000         25  8,33333  66,6667             3 01+crlf-sorted-duplicity_removed.smi
            21  7,00000 56,0000         25  8,33333  66,6667             3 01+crlf.smi
            21 10,50000 84,0000         25 12,50000 100,0000             2 01.smi
            19  0,00000  0,0000         23  0,00000   0,0000             0 02+crlf-duplicity.smi
            30  2,14286 17,1429         32  2,28571  18,2857            14 02+crlf-sorted-duplicity_included.smi
            30  2,14286 17,1429         32  2,28571  18,2857            14 02+crlf-sorted-duplicity_removed.smi
            30  2,14286 17,1429         31  2,21429  17,7143            14 02+crlf.smi
            28  2,54545 20,3636         29  2,63636  21,0909            11 02.smi
            19  0,00000  0,0000         23  0,00000   0,0000             0 03+crlf-duplicity.smi
            55  0,78571  6,2857         48  0,68571   5,4857            70 03+crlf-sorted-duplicity_included.smi
            55  0,78571  6,2857         48  0,68571   5,4857            70 03+crlf-sorted-duplicity_removed.smi
            56  0,80000  6,4000         48  0,68571   5,4857            70 03+crlf.smi
            53  0,91379  7,3103         46  0,79310   6,3448            58 03.smi
            19  0,00000  0,0000         23  0,00000   0,0000             0 04+crlf-duplicity.smi
           119  0,37072  2,9657         92  0,28660   2,2928           321 04+crlf-sorted-duplicity_included.smi
           119  0,37072  2,9657         93  0,28972   2,3178           321 04+crlf-sorted-duplicity_removed.smi
           125  0,38941  3,1153         97  0,30218   2,4174           321 04+crlf.smi
           121  0,43525  3,4820         96  0,34532   2,7626           278 04.smi
            19  0,00000  0,0000         23  0,00000   0,0000             0 05+crlf-duplicity.smi
           309  0,21150  1,6920        228  0,15606   1,2485         1.461 05+crlf-sorted-duplicity_included.smi
           309  0,21150  1,6920        229  0,15674   1,2539         1.461 05+crlf-sorted-duplicity_removed.smi
           350  0,23956  1,9165        248  0,16975   1,3580         1.461 05+crlf.smi
           349  0,26784  2,1427        258  0,19800   1,5840         1.303 05.smi
            19  0,00000  0,0000         23  0,00000   0,0000             0 06+crlf-duplicity.smi
         1.416  0,13441  1,0753        870  0,08258   0,6607        10.535 06+crlf-sorted-duplicity_included.smi
         1.398  0,13270  1,0616        858  0,08144   0,6515        10.535 06+crlf-sorted-duplicity_removed.smi
         1.843  0,17494  1,3995      1.064  0,10100   0,8080        10.535 06+crlf.smi
         1.872  0,19537  1,5629      1.109  0,11574   0,9259         9.582 06.smi
            65  0,55556  4,4444         59  0,50427   4,0342           117 07+crlf-duplicity.smi
         7.922  0,10099  0,8079      3.999  0,05098   0,4078        78.443 07+crlf-sorted-duplicity_included.smi
         7.862  0,10038  0,8030      3.955  0,05049   0,4040        78.326 07+crlf-sorted-duplicity_removed.smi
        11.011  0,14037  1,1230      5.510  0,07024   0,5619        78.443 07+crlf.smi
        11.237  0,15520  1,2416      5.686  0,07853   0,6283        72.402 07.smi
           222  0,18049  1,4439        174  0,14146   1,1317         1.230 08+crlf-duplicity.smi
        48.522  0,08193  0,6554     20.112  0,03396   0,2717       592.237 08+crlf-sorted-duplicity_included.smi
        48.143  0,08146  0,6517     19.878  0,03363   0,2691       591.007 08+crlf-sorted-duplicity_removed.smi
        68.820  0,11620  0,9296     31.406  0,05303   0,4242       592.237 08+crlf.smi
        69.956  0,12658  1,0127     31.789  0,05752   0,4602       552.648 08.smi
           859  0,10592  0,8473        576  0,07102   0,5682         8.110 09+crlf-duplicity.smi
       315.453  0,06851  0,5481    110.170  0,02393   0,1914     4.604.485 09+crlf-sorted-duplicity_included.smi
       312.400  0,06797  0,5437    108.927  0,02370   0,1896     4.596.375 09+crlf-sorted-duplicity_removed.smi
       459.274  0,09974  0,7980    197.709  0,04294   0,3435     4.604.485 09+crlf.smi
       462.969  0,10687  0,8550    198.545  0,04583   0,3667     4.331.887 09.smi
         5.687  0,07334  0,5867      3.202  0,04129   0,3303        77.546 10+crlf-duplicity.smi
     2.162.672  0,05995  0,4796    694.588  0,01926   0,1540    36.072.267 10+crlf-sorted-duplicity_included.smi
     2.145.700  0,05961  0,4769    687.312  0,01909   0,1528    35.994.721 10+crlf-sorted-duplicity_removed.smi
     3.159.277  0,08758  0,7007  1.370.511  0,03799   0,3039    36.072.267 10+crlf.smi
     3.190.041  0,09339  0,7471  1.388.889  0,04066   0,3253    34.157.176 10.smi
        29.229  0,05760  0,4608     13.362  0,02633   0,2106       507.475 11+crlf-duplicity.smi
    15.158.795  0,05249  0,4200  4.649.225  0,01610   0,1288   288.771.259 11+crlf-sorted-duplicity_included.smi
    15.040.303  0,05218  0,4174  4.607.367  0,01598   0,1279   288.263.784 11+crlf-sorted-duplicity_removed.smi
    22.382.099  0,07751  0,6201  9.912.052  0,03432   0,2746   288.771.259 11+crlf.smi
    22.639.068  0,08236  0,6589 10.061.120  0,03660   0,2928   274.870.869 11.smi
       211.497  0,04940  0,3952     83.257  0,01945   0,1556     4.281.332 12+crlf-duplicity.smi
     1.558.991  0,04199  0,3359    517.891  0,01395   0,1116    37.129.750 13+crlf-duplicity.smi
    89.516.939  0,06652  0,5321 34.733.065  0,02581   0,2065 1.345.800.583 Total
    
    2016/02/13 (2016/02/14 Edit text)
    I'm looking for implement thread pool (win32 thread): it's a nightmare.
    
    2016/02/02 (2016/02/07 Added MaxCompr.tar, LTCB, Other. 2016/03/13 Added SRR062634.filt.fastq)
    Benchmarks for version in development (2016/02/01) compared to the last official version.
     0.1.1 -mx develop.-mx Maximum Compression
       820.501     820.140 A10.jpg
     1.017.441     998.416 AcroRd32.exe
       400.343     383.246 english.dic
     3.574.241   3.567.537 FlashMX.pdf
       280.132     266.983 FP.LOG
     1.387.079   1.366.681 MSO97.DLL
       687.731     682.960 ohs.doc
       653.728     635.473 rafale.bmp
       399.793     392.447 vcfiu.hlp
       372.831     366.705 world95.txt
     9.593.820   9.480.588 Total
     9.632.713   9.524.816 MaxCompr.tar
     0.1.1 -mx develop.-mx Silesia Open Source Compression Benchmark (SOSCB)
     2.047.734   2.023.802 dickens
    10.117.960   9.966.096 mozilla
     2.065.773   2.012.206 mr      (SOSCB best)
       969.798     933.615 nci
     1.698.391   1.663.621 ooffice
     2.081.744   2.049.413 osdb
       830.294     815.091 reymont
     2.799.600   2.773.724 samba
     3.807.684   3.775.695 sao
     5.292.616   5.201.770 webster
     3.577.455   3.561.918 x-ray   (SOSCB best)
       281.484     277.915 xml
    35.570.533  35.054.866 Total
    0.1.1 -m2,3,0x03ededff develop.-mx Large Text Compression Benchmark
                   (>&b12)
                        24          24 ENWIK0
                        31          31 ENWIK1
                        91          88 ENWIK2
                       299         285 ENWIK3
                     2.997       2.947 ENWIK4
                    25.568      25.295 ENWIK5
                   214.137     211.863 ENWIK6
                 1.968.717   1.946.606 ENWIK7
                18.153.319  17.928.046 ENWIK8
    0.1.1 -m2,3,0x03ededff develop.-mx Other
                   194.417     192.249 book1
                   613.782     606.341 Calgary corpus.tar
                   331.931     327.840 Canterbury corpus.tar
    0.1.1 -m2,3,0x00a968fd develop.-mx Compression Competition -- $15,000 USD
                14.897.825  14.769.089 SRR062634.filt.fastq
    
    2016/01/28
    Version 0.1.1, new test on Silesia Open Source Compression Benchmark 1 2: precomp 0.4.4 -cn | cmv -mx (-mx is for comparison).
          -mx   .pcf | -mx Silesia Open Source Compression Benchmark
     2.047.734   2.047.749 dickens
    10.117.960   8.195.417 mozilla
     2.065.773   2.065.797 mr
       969.798     969.801 nci
     1.698.391   1.698.456 ooffice
     2.081.744   2.081.772 osdb
       830.294     830.318 reymont
     2.799.600   1.989.420 samba
     3.807.684   3.807.687 sao
     5.292.616   5.292.618 webster
     3.577.455   3.577.474 x-ray
       281.484     281.496 xml
    35.570.533  32.838.005 Total
    
    2016/01/26 (2016/10/27 Added Switches)
    Version 0.1.1, new test on Maximum Compression benchmark: -max options (-mx is for comparison).
          -mx      -max         Switches Maximum Compression
      820.501   819.832 -m2,3,0x00ad69f5 A10.jpg
    1.017.441 1.017.402 -m2,3,0x03ededff AcroRd32.exe
      400.343   395.674 -m2,3,0x00a9fb7d english.dic
    3.574.241 3.573.680 -m2,3,0x03edfc7d FlashMX.pdf
      280.132   278.282 -m2,3,0x03eb7dff FP.LOG (-max -a15,M5)
    1.387.079 1.386.634 -m1,3,0x03ede5fd MSO97.DLL
      687.731   687.186 -m2,3,0x03ededfd ohs.doc
      653.728   651.837 -m1,2,0x00a9f9bf rafale.bmp
      399.793   399.784 -m2,3,0x03edfdbf vcfiu.hlp
      372.831   372.551 -m2,3,0x03ededfd world95.txt
    9.593.820 9.582.862                  Total
    
    2016/01/16 (2016/01/22 Added benchmark)
    Cmv has quantized entropy of the previous few bytes in the contexts of 3 mixers (1+2+0) of the last stage mixers (6+3+1) since beginning.
    Now I'm testing a model which have in the context a q.e. of the previous few bits and q.e. of the same bit position in the previous few bytes.
      -m2,0,* +new model Maximum Compression
      820.749    820.736 A10.jpg.cmv
    1.019.353  1.018.527 AcroRd32.exe.cmv
      403.026    402.533 english.dic.cmv
    3.577.000  3.576.947 FlashMX.pdf.cmv
      281.065    280.861 FP.LOG.cmv
    1.391.254  1.390.140 MSO97.DLL.cmv
      688.787    688.334 ohs.doc.cmv
      653.804    651.149 rafale.bmp.cmv
      400.620    400.145 vcfiu.hlp.cmv
      374.695    374.560 world95.txt.cmv
    9.610.353  9.603.932 Total
    
    2016/01/14
    Stuff made for version 0.2:
    - Added mixer2 of 2 mixerN, one with fast and one with slow learning rate: mixer2(mixerN_fast_lr(), mixerN_slow_lr()).
      Mixer will be ~2x slower and need ~2x memory.
      Need to add a switch to disable/enable this mixer.
      Some tests in http://encode.su/threads/2284-CMV?p=...ll=1#post46203
    - During cmv development I tested some naive SSE, last month I implemented a serious SSE, but it's good only to double the input predictions of a mixer, not to tuning a mixer.
      Linear SSE seems to be better than logistic one.
      5 bit quantization are better, but 3 bit are quite good and they are 4x smaller so I choose 3 bit.
      Mixer will be ~2x slower and need ~2x memory.
      Need to add a switch to disable/enable this SSE.
      SSE2 (2 input predictions + context) hurts compression ratio.
    - Proof of concept: added 4*2 low order delta-like models.
      Added 4 low order interleaved models and 1 BMP-oriented model (no detection of BMP ID).
      Need to add a switch to disable/enable them.
        -mx (1)     -mx (2)    -mx (3) Maximum Compression
        820.501     820.891    820.160 A10.jpg
      1.017.441   1.014.559    999.176 AcroRd32.exe
        400.343     398.988    383.418 english.dic
      3.574.241   3.575.464  3.567.760 FlashMX.pdf
        280.132     279.555    267.099 FP.LOG
      1.387.079   1.386.033  1.367.716 MSO97.DLL
        687.731     688.414    683.408 ohs.doc
        653.728     641.349    636.813 rafale.bmp
        399.793     399.125    392.915 vcfiu.hlp
        372.831     372.893    366.762 world95.txt
      9.593.820   9.577.271  9.485.227 Total
    -mx (1): 0.1.1
    -mx (2): -mx (1) + delta-like + interleaved + BMP-oriented
    -mx (3): -mx (2) + SSE + mix2(mixN_fast, mixN_slow), ~5720 Mb, speed ~2.8x slower than -mx (1)
    
    2016/01/12 (2016/01/14 Added cmix)
    Prime Number Benchmark http://encode.su/threads/2414-Prime-Number-Benchmark
      msb    lsb    dbl    hex   text  twice interlv delta bytemap bitmap total  tarball tarball/total (total = 4803388; tarball = 4812800, 7z.exe a -ttar)
    39547  39314  40403  41851  39019  39569  40831  34251  25789  32110  372684  292304 0,78432130169 cmv -mx
    39379  39181  40011  41629  38829  39400  40548  33999  23851  31659  368486  290754 0,78905033027 cmv -max
    37835  37973  38597  41629  38829  37894  36255  33999  23851  30518  357380                       Best overall (2016/01/12)
       nz     nz     nz    cmv    cmv     nz     nz    cmv    cmv   zpaq
    37835  37973  38597  41619  38829  37894  36255  33726  23851  29527  356106                       Best overall (2016/01/14)
       nz     nz     nz   cmix    cmv     nz     nz   cmix    cmv   cmix
    msb     -m0,0,0x00abf9fd lsb     -m1,1,0x00a96b7f dbl   -m0,3,0x00ab69bd hex     -m1,2,0x02a1e97f text   -m1,3,0x00a1e57f
    twice   -m0,0,0x00abf9fd interlv -m1,3,0x01a8f86d delta -m2,3,0x03a809bd bytemap -m0,3,0x00aa096e bitmap -m0,0,0x00aaa8ef
    tarball -m2,3,0x03ade97f
    Word model is disabled 9/11.
    Variable order and memory model is disabled 8/11.
    Counter type is always sets to 0 (1 counter) (11/11).
    
    2016/01/10 Start this experiment.
    I don't want to take much time to verify my english in this post, sorry if it's bad.

    CMV-00.02.00-alpha*.7z are protected by password.
    Attached Files Attached Files
    Last edited by Mauro Vezzosi; 19th October 2017 at 23:54.

  7. #35
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    290
    Thanks
    120
    Thanked 169 Times in 125 Posts
    I'm trying to improve the N input mixer because it seems that sometimes it is slow to learn or it is inaccurate weighting the models.
    During the development of cmv I tested many naive improvements, most of them hurts the compression and the rest gains just a little bit (now they are disabled).
    Here are some my notes and questions:
    - To speed up the learning of the mixer, I divided the N input mixer in 4 sub-mixer with N/4 input then I mixed the 4 sub-mixer, but compression was worse. Also a binary tree structure was bad.
    - I tested this SAC's update (isn't it good only for a neural network offline learning?) and the MCM's skew update (need to retest better): both hurts the cmv compression.
    - Cmv hasn't a learning rate (l.r.) decay, because I think it's good only for the beginning of the compression.
    Instead, in the next release, cmv will able to switch mixerN to a mixer2 of 2 mixerN, one with fast and one with slow l.r.: mixer2(mixerN_fast_lr(), mixerN_slow_lr()):

    Code:
    Maximum Compression
          -m1                 -m2,0,*
      821.357 ->   820.585    820.749 ->   820.203   A10.jpg
    1.078.743 -> 1.062.963  1.019.353 -> 1.003.584   AcroRd32.exe
      422.134 ->   408.529    403.026 ->   386.589   english.dic
    3.595.144 -> 3.589.232  3.577.000 -> 3.569.634   FlashMX.pdf
      317.694 ->   308.710    281.065 ->   269.102   FP.LOG
    1.447.944 -> 1.429.470  1.391.254 -> 1.372.519   MSO97.DLL
      701.802 ->   697.324    688.787 ->   684.083   ohs.doc
      691.838 ->   687.293    653.804 ->   648.500   rafale.bmp
      426.997 ->   419.478    400.620 ->   394.323   vcfiu.hlp
      399.994 ->   395.238    374.695 ->   368.766   world95.txt
    9.903.647 -> 9.818.822  9.610.353 -> 9.517.303   Total
                            9.680.949 -> 9.593.013   Tarball
    
    Silesia Open Source Compression Benchmark
       -m2,0,*
     2.061.655 ->  2.038.269   dickens
    10.234.858 -> 10.111.310   mozilla
     2.068.033 ->  2.052.658   mr
       974.167 ->    943.376   nci
     1.704.074 ->  1.676.073   ooffice
     2.087.518 ->  2.053.212   osdb
       834.869 ->    821.554   reymont
     2.824.314 ->  2.801.082   samba
     3.811.344 ->  3.780.772   sao
     5.386.512 ->  5.299.166   webster
     3.577.793 ->  3.566.768   x-ray :-)
       282.400 ->    279.063   xml
    35.847.537 -> 35.423.303   Total
    
    Large Text Compression Benchmark
       -m2,0,*
            24 ->         24   ENWIK0
            31 ->         31   ENWIK1
            90 ->         88   ENWIK2
           297 ->        286   ENWIK3
         2.996 ->      2.949   ENWIK4
        25.587 ->     25.319   ENWIK5
       214.576 ->    212.349   ENWIK6
     1.985.628 ->  1.963.722   ENWIK7
    18.780.157 -> 18.549.497   ENWIK8
    
    Calgary corpus (14 files)
    -m2,0,*
    615.111 -> 610.207   Tarball
    Canterbury corpus
    -m2,0,*
    332.694 -> 329.932   Tarball
    - Has anyone successfully added a momentum in the weights update of the mixer? I don't.
    - Did anyone try any kind of history in the mixer (like model with bit history) or implement something like a "secondary weight estimation" (like SSE/APM)? All my attenpts failed. For history I mean everything happened in the past.
    - Has anyone any more suggestions to test in the mixer?

  8. #36
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,274
    Thanks
    803
    Thanked 545 Times in 415 Posts
    Quote Originally Posted by Mauro Vezzosi View Post
    CMV 0.1.1 is compatible with version 0.1.0 and doesn't change compression ratio or speed.
    History.txt:

    If you test cmv, take a look at Readme.txt for some more info.
    EDIT: The "a" command can take long time because it uses a brute force search.

    CMV-00.01.01.zip includes Benchmarks.txt.
    Hi!
    I've tested this version. Scores are in attached files (JPG and Excel).

    At now option "-m2,3,>" or (-mx) works fine and beat previous CMV version record (options: "'-m2,3,0x03ededff") by 7 bytes and beat some previous CMV records on particular files.

    I've tested mode with alanyse = -max, -amx, -ax -mx, however it gives the same results as just "-mx" option... It tooks the same amount of time then I suspect that maybe analyse mode doesn't work. Maybe I use it wrong?

    Best Regards
    Darek
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	Scoretable.jpg 
Views:	176 
Size:	1.66 MB 
ID:	4113  
    Attached Files Attached Files
    Last edited by Darek; 12th October 2016 at 13:55.

  9. Thanks:

    Mauro Vezzosi (1st March 2016)

  10. #37
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    290
    Thanks
    120
    Thanked 169 Times in 125 Posts
    Quote Originally Posted by Darek View Post
    Hi!
    I've tested this version. Scores are in attached files (JPG and Excel).

    At now option "-m2,3,>" or (-mx) works fine and beat previous CMV version record (options: "'-m2,3,0x03ededff") by 7 bytes and beat some previous CMV records on particular files.

    I've tested mode with alanyse = -max, -amx, -ax -mx, however it gives the same results as just "-mx" option... It tooks the same amount of time then I suspect that maybe analyse mode doesn't work. Maybe I use it wrong?

    Best Regards
    Darek
    Hi.
    Many thanks for your tests!
    You are too fast.
    Let me explain better.
    -max, -amx, -ax -mx, -mx -ax are the same thing, so use only -max, it's more mnemonic.
    -ax and -mx can be useful if you want to set maximum "analisys" or method individually, e.g.: -ax -m2 (or -m2 -ax), -a7,M5 -mx (or -mx -a7,M5).
    -a sets some options for the command "a", if you don't specify the command "a" then the switch -a is useless.

    For your testbed, to analise+compress+compare and if you trust in the CMV command "o" (compare the expanded file (without create a file) with the original one) then you can write:
    for %a in (0.WAV 1.BMP A.TIF B.TGA C.TIF D.TGA E.TIF F.JPG G.EXE H.EXE I.EXE J.EXE K.WAD L.PAK M.DBF N.ADX O.APR P.FM3 Q.WK3 R.DOC S.DOC T.DOC U.DOC V.DOC W.DOC X.DOC Y.CFG Z.MSG) do cmv64 ac -max -vv %a %a.cmv && echo. && cmv64 o %a.cmv %a
    You'll see something like this for every file (here I used -m0, but it doesn't matter):
    Code:
    Read  Read%->Write/W.Blk Final Ratio  /R. Blk  bpb   /b. Blk Elaps+Remai=Final
    2919K100.0%-> 471K/5785B  471K 0.16144/0.08827 1.2915/0.7062 42s35+00s00=42s35
    Done: bytes in 2988578, bytes out 482506, ratio 0.16145, bpb 1.2916
          time 42s37 (42.37 seconds), method 0,0,0x00100010
    
    In     482506 out    2988578 ratio 0.16145 bpb 1.2916 time 12s62 (12.62s)
    Compare: Ok
    In the 0.1.1 version, the final compressed size (in this case "bytes out 482506") also includes the header size (it's the full size file).
    If I don't bother you, for every file I would like to know the string "method 0,0,0x00100010".

    To analise+compress+expand+compare and if you don't trust in the CMV command "o" then you can write (here I used the MSDOS FC to compare the files):
    for %a in (0.WAV 1.BMP A.TIF B.TGA C.TIF D.TGA E.TIF F.JPG G.EXE H.EXE I.EXE J.EXE K.WAD L.PAK M.DBF N.ADX O.APR P.FM3 Q.WK3 R.DOC S.DOC T.DOC U.DOC V.DOC W.DOC X.DOC Y.CFG Z.MSG) do cmv64 ac -max -vv %a %a.cmv && echo. && cmv64 e %a.cmv %a.cmv.expanded && fc /b %a %a.cmv.expanded
    You'll see something like this for every file (here I used -m0, but it doesn't matter):
    Code:
    Read  Read%->Write/W.Blk Final Ratio  /R. Blk  bpb   /b. Blk Elaps+Remai=Final
    2919K100.0%-> 471K/5785B  471K 0.16144/0.08827 1.2915/0.7062 41s72+00s00=41s72
    Done: bytes in 2988578, bytes out 482506, ratio 0.16145, bpb 1.2916
          time 41s73 (41.73 seconds), method 0,0,0x00100010
    
    In     482506 out    2988578 ratio 0.16145 bpb 1.2916 time 12s07 (12.07s)
    Confronto in corso dei file world95.txt e world95.txt.cmv.expanded
    FC: nessuna differenza riscontrata
    To be secure, make a backup of your files before use CMV on them.
    Your Excel file is very interesting, I need time to analise it.
    Again, thank you very much!
    Mauro

  11. #38
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,274
    Thanks
    803
    Thanked 545 Times in 415 Posts
    Quote Originally Posted by Mauro Vezzosi View Post
    Hi.
    Many thanks for your tests!
    You are too fast.
    Let me explain better.
    -max, -amx, -ax -mx, -mx -ax are the same thing, so use only -max, it's more mnemonic.
    -ax and -mx can be useful if you want to set maximum "analisys" or method individually, e.g.: -ax -m2 (or -m2 -ax), -a7,M5 -mx (or -mx -a7,M5).
    -a sets some options for the command "a", if you don't specify the command "a" then the switch -a is useless.

    For your testbed, to analise+compress+compare and if you trust in the CMV command "o" (compare the expanded file (without create a file) with the original one) then you can write:
    for %a in (0.WAV 1.BMP A.TIF B.TGA C.TIF D.TGA E.TIF F.JPG G.EXE H.EXE I.EXE J.EXE K.WAD L.PAK M.DBF N.ADX O.APR P.FM3 Q.WK3 R.DOC S.DOC T.DOC U.DOC V.DOC W.DOC X.DOC Y.CFG Z.MSG) do cmv64 ac -max -vv %a %a.cmv && echo. && cmv64 o %a.cmv %a

    You'll see something like this for every file (here I used -m0, but it doesn't matter):
    To be secure, make a backup of your files before use CMV on them.
    Your Excel file is very interesting, I need time to analise it.
    Again, thank you very much!
    Mauro
    Thank you very much!
    Now I undestand. I've started to analyse from small file (r.doc) and it works!
    I'll post scores after test. Of course with method signatures.

    These files are testbed and i have some copies of it to ensure that oryginal files are safe.
    My final scores in table always are full file lenght - means includes the header size (it's the full size file).

    Best Ragards,
    Darek
    Last edited by Darek; 1st March 2016 at 15:04.

  12. #39
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,274
    Thanks
    803
    Thanked 545 Times in 415 Posts
    First files after analyse and optimisation in attached file. Sheet "CMV methods" contains methods as it says

    According to Q.WK3 file because optimal method gives the same score like -mx.
    I've realized earlier that for this file critical factor is memory size - look at the other compressors scores - if bigger memory amount used then better score - especially CMIX.

    Regards,
    Darek
    Attached Files Attached Files
    Last edited by Darek; 7th March 2016 at 10:04.

  13. #40
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,274
    Thanks
    803
    Thanked 545 Times in 415 Posts
    CMV v00.01.01 Enwik8 score is 18.152.564 bytes, time 64436,41s, hardware: i7 4900MQ, 2.8GHz Oc. to 3.6GHz, 16GB, Win7Pro 64. Options "-mx". Decompression verified. Time 82332,50, SHA1 checksum OK. Decompression at speed 3.2GHz. Memory used: 4133MB.
    I'll chceck other options for enwik8 and the best will use to test enwik9.

    Regards,
    Darek
    Last edited by Darek; 8th March 2016 at 13:26.

  14. Thanks:

    Mauro Vezzosi (8th March 2016)

  15. #41
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    290
    Thanks
    120
    Thanked 169 Times in 125 Posts
    Thank you very much!
    I suppose the option was -mx (without analisys), not -max (with analisys).
    Remember to check how much memory CMV needs, otherwise it can't be added in LTCB.
    Mauro

  16. #42
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,274
    Thanks
    803
    Thanked 545 Times in 415 Posts
    Quote Originally Posted by Mauro Vezzosi View Post
    Thank you very much!
    I suppose the option was -mx (without analisys), not -max (with analisys).
    Remember to check how much memory CMV needs, otherwise it can't be added in LTCB.
    Mauro
    Ok, I've updated previous post.

    The command line was: "cmv64 -c -max enwik8 enwik8max.cmv". I understand that w/o additional parameters this option runs as "-mx". That it should be changed to -mx.

    Darek

  17. #43
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,274
    Thanks
    803
    Thanked 545 Times in 415 Posts
    Full test of CMV with analyse and optimisation in attached file - done!
    Sheet "CMV methods" contains methods as it says.

    Generally gain is about 0.10% to "-mx" version, and about 0.06% to best scores from various options.

    Darek
    Attached Files Attached Files
    Last edited by Darek; 11th March 2016 at 12:36.

  18. Thanks:

    Mauro Vezzosi (11th March 2016)

  19. #44
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    290
    Thanks
    120
    Thanked 169 Times in 125 Posts
    Thank you very much!
    Code:
         Total   Non model       Model
    12.619.932   5.881.644   6.738.288   -mx
    12.607.100   5.878.735   6.728.365   -max
        12.832       2.909       9.923   Gain
         0,102%      0,049%      0,147%  Gain %
    The gain is very little.
    Good: it seems the mixer works quite well and even if a model don't fit the data structure, it can be still little bit useful.
    Bad: disabling some models or other stuff, compression ratio can be improved just little bit.

    Next days I'll see sheet "CMV methods".

    Again, thanks!
    Bye.
    Mauro

  20. #45
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,274
    Thanks
    803
    Thanked 545 Times in 415 Posts
    Quote Originally Posted by Mauro Vezzosi View Post
    Thank you very much!
    Code:
         Total   Non model       Model
    12.619.932   5.881.644   6.738.288   -mx
    12.607.100   5.878.735   6.728.365   -max
        12.832       2.909       9.923   Gain
         0,102%      0,049%      0,147%  Gain %
    The gain is very little.
    Good: it seems the mixer works quite well and even if a model don't fit the data structure, it can be still little bit useful.
    Bad: disabling some models or other stuff, compression ratio can be improved just little bit.

    Next days I'll see sheet "CMV methods".

    Again, thanks!
    Bye.
    Mauro
    Maybe it looks like a tiny gain, however for record breaking it's could be worth to test.

    From other hand for enwik9 gain of 0,10% it's a one and half of CMV decompressor size (zipped) and for some cases it could mean one position higher on LTCB list

    @Matt - on LTCB page there is an label of one column called "Decompresser size (zip)" - is ot should be "Decompressor size (zip)" instead?

    Regards,
    Darek

  21. #46
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,274
    Thanks
    803
    Thanked 545 Times in 415 Posts
    CMV v00.01.01 Enwik8 score: 18.152.564 bytes, time: 64436,41s, hardware: i7 4900MQ, 2.8GHz Oc. to 3.6GHz, 16GB, Win7Pro 64. Options "-mx". Decompression verified. Time 82332,50, SHA1 checksum OK. Decompression at speed 3.2GHz. Memory used: 4133MB.

    CMV v00.01.01 Enwik8 score: 18.124.130 bytes, time: 47928,33s, hardware: i7 4900MQ, 2.8GHz Oc. to 3.6GHz, 16GB, Win7Pro 64. Options "-m2,3,0x03ed7dfd". Decompression to be verified.

    Analyse option in progress.

    I'll chceck other options for enwik8 and the best will use to test enwik9.

    Regards,
    Darek

  22. #47
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 798 Times in 489 Posts
    > @Matt - on LTCB page there is an label of one column called "Decompresser size (zip)" - is ot should be "Decompressor size (zip)" instead?

    I've seen it spelled both ways. Google gives 481K hits with "e", 427K hits with "o".

  23. Thanks:

    Darek (13th March 2016)

  24. #48
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    290
    Thanks
    120
    Thanked 169 Times in 125 Posts
    Quote Originally Posted by Darek View Post
    CMV v00.01.01 Enwik8 score: 18.152.564 bytes, time: 64436,41s, hardware: i7 4900MQ, 2.8GHz Oc. to 3.6GHz, 16GB, Win7Pro 64. Options "-mx". Decompression verified. Time 82332,50, SHA1 checksum OK. Decompression at speed 3.2GHz. Memory used: 4133MB.

    CMV v00.01.01 Enwik8 score: 18.124.130 bytes, time: 47928,33s, hardware: i7 4900MQ, 2.8GHz Oc. to 3.6GHz, 16GB, Win7Pro 64. Options "-m2,3,0x03ed7dfd". Decompression to be verified.
    Gain: 28434 byte, 0,157%; 16508,08s 25,62%!
    -m2,3,0x03ed7dfd is -mx without 39 match models with long gap.
    Estimate new ENWIK9.cmv using current best compression (-m2,3,+): ENWIK9.cmv / ENWIK8.cmv * 18124130 = 150226739 / 18218283 * 18124130 = 149450360
    Hopefully it should be little bit smaller, between 149.0 and 149.3 MB (MB = 10^6 byte).
    Thanks.
    Mauro

  25. #49
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,274
    Thanks
    803
    Thanked 545 Times in 415 Posts
    Quote Originally Posted by Mauro Vezzosi View Post
    Gain: 28434 byte, 0,157%; 16508,08s 25,62%!
    -m2,3,0x03ed7dfd is -mx without 39 match models with long gap.
    Estimate new ENWIK9.cmv using current best compression (-m2,3,+): ENWIK9.cmv / ENWIK8.cmv * 18124130 = 150226739 / 18218283 * 18124130 = 149450360
    Hopefully it should be little bit smaller, between 149.0 and 149.3 MB (MB = 10^6 byte).
    Thanks.
    Mauro
    my other cmv 00.01.01 scores for ENWIK8:

    18 152 564 bytes, time: 64436,41s, option: -mx
    18 491 124 bytes, time: 12116,98s, option: -m2,2,>
    18 485 825 bytes, time: 69964,14s, option: -m2,1,>
    18 642 021 bytes, time: 62988,10s, option: -m2,0,>
    18 153 319 bytes, time: 67512,17s, option: -m2,3,0x03ededff
    18 153 319 bytes, time: 69197,15s, option: -m2,3,0x03ededff -mx
    18 494 431 bytes, time: 44679,49s, option: -m1,2,0x03ed7dfd
    18 435 068 bytes, time: 44543,11s, option: -m1,3,0x03ed7dfd
    18 287 879 bytes, time: 50909,89s, option: -m2,2,0x03ed7dfd
    18 130 935 bytes, time: 45875,75s, option: -m2,3,0x03ed7dfa
    18 122 372 bytes, time: 45678,87s, option: -m2,3,0x03ed7dfb
    18 132 671 bytes, time: 46272,49s, option: -m2,3,0x03ed7dfc
    18 124 130 bytes, time: 47928,33s, option: -m2,3,0x03ed7dfd
    18 146 656 bytes, time: 54785,06s, option: -m2,3,0x03ed7dfe
    18 138 339 bytes, time: 53747,68s, option: -m2,3,0x03ed7dff
    18 126 353 bytes, time: 48684,66s, option: -m2,3,0x03ed7df9
    18 139 139 bytes, time: 23999,95s, option: -m2,3,0x03ed5dfb
    18 135 526 bytes, time: 41808,82s, option: -m2,3,0x03ecfdfb
    18 127 405 bytes, time: 52573,19s, option: -m2,3,0x02ed7dfb
    18 192 132 bytes, time: 36604,93s, option: -m2,3,0x03ed6cb4 - option founded by analyse first 100KB of enwik8 started from -mx
    18 163 895 bytes, time: 38382,20s, option: -m2,3,0x03ed6db7 - option founded by analyse first 500KB of enwik8 started from -mx
    18 160 373 bytes, time: 48598,53s, option: -m2,3,0x03ed6dbb - option founded by analyse first 1MB of enwik8 started from -mx
    18 160 373 bytes, time: 47395,41s, option: -m2,3,0x03ed6dbb - option founded by analyse first 1MB of enwik8 started from -m2,3,0x03ed7dfb
    18 160 373 bytes, time: 50698,54s, option: -m2,3,0x03ed6dbb - option founded by analyse first 1MB of enwik8 started from -m2,3,0x03ed6dfb
    18 160 373 bytes, time: 49117,21s, option: -m2,3,0x03ed6dbb - option founded by analyse first 1MB of enwik8 started from default
    18 123 133 bytes, time: 40353,56s, option: -m2,3,0x03ed6dfb - option founded by analyse first 5MB of enwik8 started from -mx
    18 122 372 bytes, time: 45678,87s, option: -m2,3,0x03ed7dfb - 1'st best option founded by analyse first 10MB of enwik8 started from -m2,3,0x03ed7dfb, time from previous test
    18 141 490 bytes, time: 37886,82s, option: -m2,3,0x03ed7ddb - 2'nd best option founded by analyse first 10MB of enwik8 started from -m2,3,0x03ed7dfb
    18 123 133 bytes, time: 40353,56s, option: -m2,3,0x03ed6dfb - 3'rd best option founded by analyse first 10MB of enwik8 started from -m2,3,0x03ed7dfb, time from previous test
    18 123 571 bytes, time: 32853,56s, option: -m2,3,0x03ed7cfb - 4'th best option founded by analyse first 10MB of enwik8 started from -m2,3,0x03ed7dfb

    Darek
    Last edited by Darek; 21st March 2016 at 02:49.

  26. Thanks:

    Mauro Vezzosi (15th March 2016)

  27. #50
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,274
    Thanks
    803
    Thanked 545 Times in 415 Posts
    With big support of Mauro Vezzosi we've made analyse of first 10M of ENWIK8 and tested EWNIK8 and ENWIK9 with the best fouded method for CMV v00.01.01. Results as follows:

    CMV v00.01.01 Enwik8 score: 18.122.372 bytes, time: 45678,87s, hardware: i7 4900MQ, 2.8GHz Oc. to 3.5GHz, 16GB, Win7Pro 64. Options "-m2,3,0x03ed7dfb". Decompression verified. Time 43861,21s, SHA1 checksum OK. Memory used: 3335MB.
    CMV v00.01.01 Enwik9 score: 149.357.765 bytes, time: 426162,96s, hardware: i7 4900MQ, 2.8GHz Oc. to 3.5GHz, 16GB, Win7Pro 64. Options "-m2,3,0x03ed7dfb". Decompression to be verified. Memory used: 3335MB.

    Darek

  28. Thanks:

    Mauro Vezzosi (20th March 2016)

  29. #51
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    290
    Thanks
    120
    Thanked 169 Times in 125 Posts
    It's all to your credit!
    Many thanks to you for your tests, patience and time.

    CMV and EMMA compress ENWIK9 worse than ENWIK8 compared to other compressors, probably they are more memory sensitive.
    Code:
                                                      Compressed size      ENW9/ENW8  ENW8/ENW9
                                                    ENWIK8       ENWIK9
    cmix v8                                       15,709,216  123,930,173  7,8890107  0,1267586
    durilca'kingsize  -m13000 -o40 -t2            16,209,167  127,377,411  7,8583564  0,1272531
    paq8pxd_v12_biondivers_x64 -11                16,361,221  129,435,477  7,9111135  0,1264045
    paq8hp12any       -8                          16,230,028  132,045,026  8,1358471  0,1229128
    zpaq 6.42         -m s10.0.5fmax6             17,855,729  142,252,605  7,9667767  0,1255213
    drt|lpaq9m        9                           17,964,751  143,943,759  8,0125663  0,1248040
    mcm 0.83          -x11                        18,233,295  144,854,575  7,9445089  0,1258731
    nanozip 0.09a     -cc -m32g -p1 -t1 -nm       18,594,163  148,545,179  7,9888070  0,1251751
    cmv 00.01.00      -m2,3,+                     18,218,283  150,226,739  8,2459329  0,1212719
    cmv 00.01.00      -m2,3,0x03ed7dfb            18,122,372  149,357,765  8,2416234  0,1213353
    emma 0.1.4        (max English text)          17,865,328  148,887,824  8,3338982  0,1199919
    Mauro

  30. #52
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,274
    Thanks
    803
    Thanked 545 Times in 415 Posts
    Quote Originally Posted by Mauro Vezzosi View Post
    It's all to your credit!
    Many thanks to you for your tests, patience and time.

    CMV and EMMA compress ENWIK9 worse than ENWIK8 compared to other compressors, probably they are more memory sensitive.
    Code:
                                                      Compressed size      ENW9/ENW8  ENW8/ENW9
                                                    ENWIK8       ENWIK9
    cmix v8                                       15,709,216  123,930,173  7,8890107  0,1267586
    durilca'kingsize  -m13000 -o40 -t2            16,209,167  127,377,411  7,8583564  0,1272531
    paq8pxd_v12_biondivers_x64 -11                16,361,221  129,435,477  7,9111135  0,1264045
    paq8hp12any       -8                          16,230,028  132,045,026  8,1358471  0,1229128
    zpaq 6.42         -m s10.0.5fmax6             17,855,729  142,252,605  7,9667767  0,1255213
    drt|lpaq9m        9                           17,964,751  143,943,759  8,0125663  0,1248040
    mcm 0.83          -x11                        18,233,295  144,854,575  7,9445089  0,1258731
    nanozip 0.09a     -cc -m32g -p1 -t1 -nm       18,594,163  148,545,179  7,9888070  0,1251751
    cmv 00.01.00      -m2,3,+                     18,218,283  150,226,739  8,2459329  0,1212719
    cmv 00.01.00      -m2,3,0x03ed7dfb            18,122,372  149,357,765  8,2416234  0,1213353
    emma 0.1.4        (max English text)          17,865,328  148,887,824  8,3338982  0,1199919
    Mauro
    Or these compressors have something like Adaptive Learning - if bigger file, then better compression. CMIX have (as I understand) some kind of neuron net learning algorithm.
    As I found this option (Adaptive Learning Rate) works not so proper in EMMA and maybe then his score is similar to CMV.

    Darek

  31. #53
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    578
    Thanks
    220
    Thanked 834 Times in 342 Posts
    Quote Originally Posted by Darek View Post
    Or these compressors have something like Adaptive Learning - if bigger file, then better compression. CMIX have (as I understand) some kind of neuron net learning algorithm.
    As I found this option (Adaptive Learning Rate) works not so proper in EMMA and maybe then his score is similar to CMV.

    Darek
    Actually, the adaptive learning rate is part of what helps EMMA score reasonably well on the LTCB:

    enwik8 - 17.848.906 bytes in 6088s, Adaptive learning rate on
    enwik8 - 17.897.964 bytes in 5926s, Adaptive learning rate off

    I don't know about CMV (thought I suspect it might be the same reason), but for EMMA the reason for the limited performance in enwik9
    is basically the "low" memory usage, less than 1000MB. I believe it's not so much the performance of EMMA and CMV that is bad on enwik9,
    it is more a case of it being very good on such low memory in enwik8. If given more memory, I think both would have a nice improvement
    in enwik9.

    Personally, the most impressive results are from MCM

  32. #54
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,274
    Thanks
    803
    Thanked 545 Times in 415 Posts
    Quote Originally Posted by mpais View Post
    Actually, the adaptive learning rate is part of what helps EMMA score reasonably well on the LTCB:

    enwik8 - 17.848.906 bytes in 6088s, Adaptive learning rate on
    enwik8 - 17.897.964 bytes in 5926s, Adaptive learning rate off

    I don't know about CMV (thought I suspect it might be the same reason), but for EMMA the reason for the limited performance in enwik9
    is basically the "low" memory usage, less than 1000MB. I believe it's not so much the performance of EMMA and CMV that is bad on enwik9,
    it is more a case of it being very good on such low memory in enwik8. If given more memory, I think both would have a nice improvement
    in enwik9.

    Personally, the most impressive results are from MCM
    Ok, I understand. Sorry for this. My apologies.
    After my testbed files I've got improper conviction.

    So as I see increasing memory is quite easy way to improve CM compression performance. The question is - where is the limit of memory increasing improvement?

    I support both compressors improvement and the way of development you choose. However I quietly still look forward to beat estabilished records...

    MCM results are really impressive.

    Darek

  33. #55
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    290
    Thanks
    120
    Thanked 169 Times in 125 Posts
    I think that an Adaptive Learning Rate (ALR) isn't so useful in ENWIK* because ENWIK* are quite stationary and ALR helps only in the first few dozen MB.

    When I wrote that CMV and EMMA are more memory sensitive I meant that compressing the first N MB they can be more efficient than other compressor, then they can be less efficient:
    - If I understood how *PAQ* series works, CMV ICM needs less memory per context and prediction is less precise: more memory --> less collisions --> more precision of prediction.
    - EMMA uses less memory, it seems to have a very good prediction and, I guess, it need more memory per context: more memory --> more contexts handled --> more precision of prediction.

    Obviously, other compressors improves compression ratio using more memory, but I suppose they have quite good prediction and not need much memory per context (they are between CMV and EMMA and have more linear performance).

    MCM don't impress me so much in ENWIK*, it has LZP and it need 5GB. CMV easily beats it in Maximum Compression and Silesia benchmark.

    I am impressed by EMMA: with ~1/6 memory EMMA is better than MCM in ENWIK8 and with ~4GB memory it could reachs MCM in ENWIK9. EMMA easily beats CMV in Maximum Compression and Silesia benchmark.

    This is just my opinion and supposition.

    Mauro

  34. #56
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    578
    Thanks
    220
    Thanked 834 Times in 342 Posts
    Quote Originally Posted by Mauro Vezzosi View Post
    I think that an Adaptive Learning Rate (ALR) isn't so useful in ENWIK* because ENWIK* are quite stationary and ALR helps only in the first few dozen MB.

    When I wrote that CMV and EMMA are more memory sensitive I meant that compressing the first N MB they can be more efficient than other compressor, then they can be less efficient:
    - If I understood how *PAQ* series works, CMV ICM needs less memory per context and prediction is less precise: more memory --> less collisions --> more precision of prediction.
    - EMMA uses less memory, it seems to have a very good prediction and, I guess, it need more memory per context: more memory --> more contexts handled --> more precision of prediction.

    Obviously, other compressors improves compression ratio using more memory, but I suppose they have quite good prediction and not need much memory per context (they are between CMV and EMMA and have more linear performance).

    MCM don't impress me so much in ENWIK*, it has LZP and it need 5GB. CMV easily beats it in Maximum Compression and Silesia benchmark.

    I am impressed by EMMA: with ~1/6 memory EMMA is better than MCM in ENWIK8 and with ~4GB memory it could reachs MCM in ENWIK9. EMMA easily beats CMV in Maximum Compression and Silesia benchmark.

    This is just my opinion and supposition.

    Mauro
    On the contrary, in EMMA, the adaptive learning rate only benefits compression when dealing with stationary sources, because it can only lower the rate,
    so when you have a file like enwik* where the structure is stationary, it can help get a better convergence on a local minimum.

    I'll try to explain how it works. Suppose your predictions are in the interval ]0,100[ and that for the previous bit you correctly predicted a "1" with 97% confidence.
    Your error is only 3, so clearly your models are doing a good job, and as such their respective weights are well balanced. So you will now adjust the
    weights, by some factor of the error (the learning rate), so as to minimize the coding cost. Now you encode the next bit, this time you predict a "0" with
    95% confidence (meaning the value of the prediction is 5), and again you are correct, so your error is -5. You adjust the weights again. You keep doing this,
    and getting really small errors, sometimes positive, sometimes negative. So the magnitude of the error is small, but you keep using the same factor for adjusting
    the weights. In this case, it may be useful to reduce this factor, to give the mixer a better chance of slowly converging on a better weight distribution. This is
    obviously a gross oversimplification, but I hope I've made the principle clear.

    In EMMA, when using ALR, the mixer keeps stats about the previous errors, and when it deems it worthwhile, slowly reduces the learning rate. If there is a benefit,
    the rate will be further decreased, but if a degradation occurs, it immediately restores the rate to its maximum value. This gives nice gains on stationary files of
    any size, from book1, to dickens, to enwik8. It also helps explain why it hurts on book2 (contains formatting) and world95.txt (not really text, more like a list of
    topics/facts, with many numbers followed by different units).

    As for the memory, EMMA uses many different structures for the context data, none of which differ much in memory usage per context from PAQ.
    In general, I classify them per number of cache-line accesses per context. PAQ8 uses 3 with its context map, EMMA can use 1 to 3, though only the
    structures with 2 or 3 are used currently.

    I just think you are not giving your CMV enough credit: it's not that its performance in enwik9 is "bad", it's just that its performance on enwik8, WHEN considering
    the memory used, is very, very good. If you look at the LTCB and search for compressors using roughly the same memory, you'll see that only those specifically
    optimized for that benchmark can beat it. lpaq9m is using DRT as a preprocessor, paq8hp12_any is incredibly optimized for enwik. To me this means that its text
    model (if you are using one) is very good, and would probably scale nicely when given many GB of memory to stretch its legs.

    As always, just my 2 cents, feel free to rebute me, I'm always looking forward for new ideas

  35. Thanks (2):

    Darek (22nd March 2016),Mauro Vezzosi (23rd March 2016)

  36. #57
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 798 Times in 489 Posts
    Quote Originally Posted by Darek View Post
    Or these compressors have something like Adaptive Learning - if bigger file, then better compression. CMIX have (as I understand) some kind of neuron net learning algorithm.
    As I found this option (Adaptive Learning Rate) works not so proper in EMMA and maybe then his score is similar to CMV.

    Darek
    Memory is the biggest factor. CMIX uses 30 GB. Next two use 13 GB.

    All of the top 10 except durilca use context mixing. Mixing is all done with simple 2 layer neural networks (no hidden layer). That just means the input predictions p_i are converted to logistic domain (stretch(p_i) = log(p_i/(1 - p_i))), and mixed by weighed summation, x = SUM_i w_i stretch(p_i). The output is converted back by the inverse (p = squash(x) = stretch^-1 (x) = 1/(1 + e^-x)). Then the weights are adjusted by the prediction error like w = w + (L)(bit - p) where the learning rate L is typically .001 to .01. This is gradient descent back propagation to minimize coding cost, which is actually simpler than minimizing RMS error. (That would be w = w + (L)(bit - p)(p)(1 - p)).

    durilca uses PPM, which is more memory efficient than CM but limited to contiguous contexts. It is closed source but I suspect it is also doing dictionary, space, and capitalization modeling before PPM and probably some CM post-processing techniques like SSE.

    zpaq is the top program not using a dictionary at #5 (and probably only because it used a 14 GB model). In theory, whole word modeling should be just as effective, but a dictionary reduces the input size, which saves memory.

  37. Thanks (2):

    Darek (23rd March 2016),Mauro Vezzosi (23rd March 2016)

  38. #58
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    290
    Thanks
    120
    Thanked 169 Times in 125 Posts
    Quote Originally Posted by mpais View Post
    [...Adaptive Learning Rate...]
    I'm doing some tests...

    Quote Originally Posted by mpais View Post
    To me this means that its text model (if you are using one) is very good, and would probably scale nicely when given many GB of memory to stretch its legs.
    CMV don't uses a specific text model, but sparse models helps on text.
    Instead, it has a word model, order 0, 1, 2; however, the order 2 is buggy and works like order 0.
    In your 20th March 2016, 21:44 ENWIK8/9 tests you sets the complexity of the text model to low: is it the best option for ENWIK8/9 or you forgot to set it to high?

    Quote Originally Posted by Matt Mahoney View Post
    This is gradient descent back propagation to minimize coding cost, which is actually simpler than minimizing RMS error. (That would be w = w + (L)(bit - p)(p)(1 - p)).
    I read something about RMS and IIRC I tried to implement it without success, probably I made something wrong.
    Can RMS be used also in online N.N.? I read it is used to train the N.N. on a set of data until RMS < value, it means that RMS was used in offline/batch N.N.
    I can't figure out how RMS involves in w = w + (L)(bit - p)(p)(1 - p).

    Mauro
    Last edited by Mauro Vezzosi; 23rd March 2016 at 17:42. Reason: Text tuning

  39. Thanks:

    mpais (23rd March 2016)

  40. #59
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    578
    Thanks
    220
    Thanked 834 Times in 342 Posts
    Quote Originally Posted by Matt Mahoney View Post
    Mixing is all done with simple 2 layer neural networks (no hidden layer).
    Not in EMMA, depending on the options used.

    Quote Originally Posted by Matt Mahoney View Post
    zpaq is the top program not using a dictionary at #5 (and probably only because it used a 14 GB model). In theory, whole word modeling should be just as effective, but a dictionary reduces the input size, which saves memory.
    In EMMA the dictionary is only used to pre-train the compression engine, it is not used for any transformation, since EMMA is a streaming compressor.

    Best regards

  41. #60
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    578
    Thanks
    220
    Thanked 834 Times in 342 Posts
    Quote Originally Posted by Mauro Vezzosi View Post
    I'm doing some tests...

    CMV don't uses a specific text model, but sparse models helps on text.
    Instead, it has a word model, order 0, 1, 2; however, the order 2 is buggy and works like order 0.
    In your 20th March 2016, 21:44 ENWIK8/9 tests you sets the complexity of the text model to low: is it the best option for ENWIK8/9 or you forgot to set it to high?

    I read something about RMS and IIRC I tried to implement it without success, probably I made something wrong.
    Can RMS be used also in online N.N.? I read it is used to train the N.N. on a set of data until RMS < value, it means that RMS was used in offline/batch N.N.
    I can't figure out how RMS involves in w = w + (L)(bit - p)(p)(1 - p).

    Mauro
    Sorry, my mistake, took the screenshot on one computer while the other was compressing, and it seems the preset file wasn't the same. The text model
    should be set to High complexity for maximum compression.

    Do you have any tests you'd like me to try?

Page 2 of 5 FirstFirst 1234 ... LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •