1. > I can't figure out how RMS involves in w = w + (L)(bit - p)(p)(1 - p).

Oops, I left out the input as a factor. If the input is x, then back propagation modifies the weight by (x)(L)(error)(p)(1 - p) where p is the output and error = desired value - p. This assumes the cost of (desired value - p)^2 (RMS error). The equation is derived by taking the partial derivative of cost with respect to the weight.

But for compression, the cost is actually -log p to code a 0 bit or -log(1 - p) to code a 1 bit, where p is the probability that the bit is 1. Let's solve for a 0 bit. Since p = 1/(1 + exp(-SUM (w_i)(x_i))), we can find the partial derivative d(cost)/dw_i with respect to the i'th weight w_i.
= d(log 1/(1 + exp(-SUM (w_i)(x_i))))/dw_i
= d(log 1/(1 + exp(-(w_i)(x_i))))/dw_i [because the other weights are not a function of w_i]
= d(log 1/(1 + exp(-wx)))/dw [simplifying notation w_i to w and x_i to x]
= d(-log(1 + exp(-wx)))/dw [log 1/x = -log x]
= -1/(1 + exp(-wx)) (0 - x exp(-wx)) [derivative by chain rule]
= x exp(-wx))/(1 + exp(-wx)) [rearrange terms]
= x/(1/exp(-wx) + 1) [divide top and bottom by exp(-wx)
= x/(exp(wx) + 1))
= -xp [by symmetry: p(w,x) = p(-w,-x)]

Thus, we move w along the gradient by step L in a negative direction when x is positive. The calculation is similar for a 1 bit, moving w the other way by x(1-p).

One complication is that a steady input of all 0 bits or all 1 bits can cause the weights to become arbitrarily large. In zpaq I limit the range to between -8 and 8.

2. ## Thanks:

Mauro Vezzosi (24th March 2016)

3. Originally Posted by mpais
Originally Posted by Mauro Vezzosi
I think that an Adaptive Learning Rate (ALR) isn't so useful in ENWIK* because ENWIK* are quite stationary and ALR helps only in the first few dozen MB.
On the contrary, in EMMA, the adaptive learning rate only benefits compression when dealing with stationary sources, because it can only lower the rate, so when you have a file like enwik* where the structure is stationary, it can help get a better convergence on a local minimum.
I was wrong and you were right.
In EMMA the ALR has the same gain ratio in ENWIK5-9, so it is useful regardless the size of ENWIK*, but it hurts compression in non-stationary data (Maximum Compression tarball):
Code:
```Adaptive learning rate in EMMA.
Configuration: the same of this post.
ALR off       ALR on  on / off
28           28  1,000000 ENWIK0
36           36  1,000000 ENWIK1
107          107  1,000000 ENWIK2
306          306  1,000000 ENWIK3
2.519        2.534  1,005955 ENWIK4
22.636       22.596  0,998233 ENWIK5
202.117      201.775  0,998308 ENWIK6
1.911.527    1.904.801  0,996481 ENWIK7
17.897.964   17.848.906  0,997259 ENWIK8
148.982.214  148.736.759  0,998352 ENWIK9
Configuration: enabled all models at maximum memory and complexity, disabled Audio models and dictionaries.
8.874.196    8.880.536  1,000714 Maximum Compression (10 files tarball)```
I'm looking for an ALR which adapt faster when the data changes (non-stationary data or at the beginning of compression), you are looking for a local minimun (stationary data).
Perhaps I need to pay more attention to stationary data.

I added a mixer in CMV which is Mix2(MixN_Fast_Learning_Rate(), MixN_Slow_Learning_Rate()) and the results are here.
It isn't smarter, it's simply better than MixN():
Code:
```    0.1.1  New mixer  New mixer/0.1.1      Maximum Compression ()
9.610.353  9.517.303  0,9903177333861 (1)  10 files individually
9.680.949  9.593.013  0,9909165929910 (2)  10 files tarball
(2) isn't better than (1), the new mixer does't adapts better when the data changes.```
Originally Posted by Darek
CMV v00.01.01 Enwik8 score: 18.122.372 bytes, time: 45678,87s, hardware: i7 4900MQ, 2.8GHz Oc. to 3.5GHz, 16GB, Win7Pro 64. Options "-m2,3,0x03ed7dfb". Decompression verified. Time 43861,21s, SHA1 checksum OK. Memory used: 3335MB.
CMV v00.01.01 Enwik9 score: 149.357.765 bytes, time: 426162,96s, hardware: i7 4900MQ, 2.8GHz Oc. to 3.5GHz, 16GB, Win7Pro 64. Options "-m2,3,0x03ed7dfb". Decompression to be verified. Memory used: 3335MB.
Have you verified the decompression?

Mauro

4. ## Thanks (2):

Darek (27th March 2016),mpais (27th March 2016)

5. Decompression not yet verified.
I've been forced to shut-off computer and abort the operation.
~2 days to go...

Darek

6. With big support of Mauro Vezzosi we've made analyse of first 10M of ENWIK8 and tested EWNIK8 and ENWIK9 with the best fouded method for CMV v00.01.01. Results as follows:

CMV v00.01.01 Enwik8 score: 18.122.372 bytes, time: 45'678,87s, hardware: i7 4900MQ, 2.8GHz Oc. to 3.5GHz, 16GB, Win7Pro 64. Options "-m2,3,0x03ed7dfb". Decompression verified. Time 43'861,21s, SHA1 checksum OK. Memory used: 3335MB.
CMV v00.01.01 Enwik9 score: 149.357.765 bytes, time: 426'162,96s, hardware: i7 4900MQ, 2.8GHz Oc. to 3.5GHz, 16GB, Win7Pro 64. Options "-m2,3,0x03ed7dfb". Decompression verified. Time 394'855,46, SHA1 checksum OK. Memory used: 3335MB.

I've updated decompression info.

Darek

7. ## Thanks (2):

Matt Mahoney (31st March 2016),Mauro Vezzosi (29th March 2016)

8. Thank you for your tests.
Mauro

9. ## Thanks:

Darek (30th March 2016)

10. Originally Posted by Mauro Vezzosi
Thank you for your tests.
Mauro
It was great challenge! Im waiting for any new releases of CMV!

11. CMV 0.2.0
This version has 3 new models, 1 improved mixer, 1 SSE.
CMV has only 2 specific models (for text and exe) and it isn't competitive with other top compressors on modelled data; instead, it has good chance on non-modelled data.
32 and 64 bits versions are compatible, with the exception for memory limit and PPM model (bit 27 of the options): they compress slightly differently if it is enabled the PPM model, due to a slightly different prediction returned by mod_ppmd (32 bits seems to be a little worse).
CMVE is an "extreme" version, it uses ~4x memory (it needs a maximum of ~~25 GiB RAM), the final stage has 15->3->1 mixers instead of 6->3->1, some models and mixers are improved and it's slower than standard version.
Use CMV* only for tests.
Thanks to Shelwien for his mod_ppmd library.
A big THANKS to Darek, who tested the alpha and beta versions and supported me in the last year during the development of this version.

Code:
```The CMV help is pretty cryptic, here are some examples:
cmv c infile cmvfile              (compress infile to cmvfile with standard -m1,0 method)
cmv e cmvfile outfile             (expand cmvfile to outfile)
cmv c -m2,3 infile cmvfile        (method -m2 and main memory level 3)
cmv c -m1,0,+ infile cmvfile      (standard method + some other optional models)
cmv c "-m2,0,*" infile cmvfile    (method -m2 + all optional models)
cmv ce -mx infile cmvfile outfile (compress infile to cmvfile then expand cmvfile to outfile, all models/mixers/options are switched on, it needs ~6.0/6.5 GiB, much slow)```
Code:
```Some benchmarks for standard version with switch -mx
Maximum Compression
819.212 A10.jpg
973.403 AcroRd32.exe
371.503 english.dic
3.554.682 FlashMX.pdf
256.511 FP.LOG
1.334.460 MSO97.DLL
677.186 ohs.doc
633.015 rafale.bmp
385.383 vcfiu.hlp
352.302 world95.txt
9.357.657 Total
9.363.199 Tarball
Silesia Open Source Compression Benchmark
1.988.153 dickens
9.754.826 mozilla
2.000.839 mr
914.998 nci
1.615.995 ooffice
2.037.774 osdb
791.623 reymont
2.710.815 samba
3.763.651 sao
5.020.376 webster
3.555.625 x-ray
270.932 xml
34.425.607 Total
Large Text Compression Benchmark
24 ENWIK0
32 ENWIK1
88 ENWIK2
286 ENWIK3
2.927 ENWIK4
24.993 ENWIK5
208.110 ENWIK6
1.898.421 ENWIK7
17.308.303 ENWIK8 (-mx)
17.344.130 ENWIK8 (-m2,3,0x5be90df7)
16.630.357 ENWIK8.drt (-m2,3,0x5be90df7)
135.668.912 ENWIK9.drt (-m2,3,0x5be90df7)
Calgary corpus (14 files)
20.656 BIB
190.195 BOOK1
120.387 BOOK2
43.007 GEO
89.360 NEWS
7.370 OBJ1
42.973 OBJ2
13.049 PAPER1
20.428 PAPER2
32.917 PIC
9.562 PROGC
10.802 PROGL
7.161 PROGP
11.508 TRANS
619.375 Total
596.727 Tarball
Canterbury corpus
34.910 alice29.txt
32.845 asyoulik.txt
5.835 cp.html
2.245 fields.c
948 grammar.lsp
8.752 kennedy.xls
83.333 lcet10.txt
121.716 plrabn12.txt
32.917 ptt5
7.412 sum
1.399 xargs.1
332.312 Total
324.463 Tarball
Other files
32.613.945 10 GB Compression Benchmark - 100mb.tar (100mb subset) (tarball)
813.102.039 Huge Files Compression Benchmark - vm.dll (-m2,3)```

12. ## Thanks (5):

comp1 (15th October 2017),Darek (15th October 2017),load (16th October 2017),Mike (15th October 2017),Stephan Busch (15th October 2017)

13. Blog
Code:
```2019/04/05
CMV version 0.2.0, DNA Corpus https://encode.su/threads/2105-DNA-Corpus.

cmix: v17.
CMo8': 2019/03/29.
cmv: 0.2.0, best of -m0,0,0x7fededff (-m0,0,>), -m0,0,0x7fede51e.
paq8px_v178: best of -7, -7a, -7f, -7af, -9, -9a, -9f, -9af.
paq8pxd64 (SSE2 build using IC19 from Shelwien): best of -s7, -s9.
DNA compr.: best of DNA compressors (https://encode.su/threads/2105-DNA-C...ll=1#post59591).
cmv: 0.2.0, optimized options.

cmix   CMo8'     cmv   paq8px  paq8pxd  |   DNALi  DNA compr.  |  cmv optimized options
chmpxx :   121024   27299   27334   27240*   27420    27550  |   24832       24832  |   27240 -m0,0,0x7fede51e
chntxx :   155844   37349   37312*  37336    37417    37541  |   31112       31112  |   37327 -m0,0,0x3c60cd7e
hehcmv :   229354   54522*  55309   54970    54644    54923  |   52513       52513  |   54950 -m1,0,0x5ce39dbf
humdyst:    38770    9295    9244*   9287     9286     9380  |    9161        9161  |    9278 -m0,0,0x7fac69fe
humghcs:    66495    9692   12129    8078*    9840     9834  |    8082        7896  |    7945 -m2,0,0x73a169fe
humhbb :    73308   16742   16956   16442*   16802    16902  |   15959       15959  |   16391 -m2,0,0x73a1edbe
humhdab:    58864   12954   13232   12461*   13058    13156  |   12192       12192  |   12366 -m2,1,0x7ba96d6e
humprtb:    56737   12780   13001   12414*   12847    12943  |   12253       12198  |   12331 -m2,0,0x73a9edaf
mpomtcg:   186609   44616   45210   44458*   44743    44891  |   43493       43493  |   44305 -m2,1,0x70aba5fa
mtpacga:   100314   23113   23178   23050*   23150    23263  |   23124       23122  |   22988 -m2,3,0x7fa9e5e8
vaccg  :   191737   44153   44186   44152*   44186    44317  |   42043       42043  |   44113 -m2,1,0x5ce2ed76
Total  :  1279056  292515  297091  289888*  293393   294700  |  274764      274521  |  289237

2019/03/31
- Results of the compression of E.coli: https://encode.su/threads/3080-3-fai...ll=1#post59619

2018/08/??-2019/01/21
- I'm trying to add RNN to cmv, it's hard to find and use an RNN library that is simple and has the features I want.
Tests with cmv c "-m0,0,<" (with small changes after 0.2.0) on MaximumCompression corpus, the input of the RNN are just the bytes already seen:
- lstm-compress 1: 2018/05/20 with cmix 2018/08/19 changes, LSTM (forget and input gates are coupled, num_cells = 32, num_layers = 3, horizon = 10, learning_rate = 0.05, gradient_clip = 2.0, init_low = -0.2, init_range = 0.4, vocab = 256).
- lstm-compress 2: lstm-compress 1 with num_layers = 1.
- lstm-compress 3: lstm-compress 1 with num_layers = 1, horizon = 4.
- kann  1: GRU (n_h_layers = 1, n_h_neurons = 32, rnn_flag = KANN_RNN_VAR_H0, norm = 0, ulen = 8, mbs = 2) + KANN_C_CEM (softmax + multi-class cross-entropy); RMSprop (lr = 0.020, decay = 0.95); x = last  1 byte , y = last 1    byte.
- kann  2: MGU (n_h_layers = 1, n_h_neurons = 32, rnn_flag = KANN_RNN_VAR_H0, norm = 0, ulen = 8, mbs = 2) + KANN_C_CEM (softmax + multi-class cross-entropy); RMSprop (lr = 0.015, decay = 0.99); x = last  1 byte , y = last 1    byte.
- kann  3: kann 2 with GRU.
- kann  4: kann 2 with LSTM.
- kann  5: MGU (n_h_layers = 1, n_h_neurons = 32, rnn_flag = KANN_RNN_VAR_H0, norm = 0, ulen = 8, mbs = 2) + KANN_C_CEM (softmax + multi-class cross-entropy); RMSprop (lr = 0.015, decay = 0.99); x = last  3 bytes, y = last 3    bytes.
- kann  6: kann 5 with GRU.
- kann  7: kann 5 with LSTM.
- kann  8: MGU (n_h_layers = 1, n_h_neurons = 32, rnn_flag = KANN_RNN_VAR_H0, norm = 0, ulen = 8, mbs = 2) + KANN_C_CEM (softmax + multi-class cross-entropy); RMSprop (lr = 0.015, decay = 0.99); x = last  7 bytes, y = last 7    bytes.
- kann  9: kann 8 with GRU.
- kann 10: kann 8 with LSTM.
- kann 11: MGU (n_h_layers = 1, n_h_neurons = 32, rnn_flag = KANN_RNN_VAR_H0, norm = 0, ulen = 8, mbs = 2) + KANN_C_CEM (softmax + multi-class cross-entropy); RMSprop (lr = 0.015, decay = 0.99); x = last  7 bytes, y = last 4- 7 bytes.
- kann 12: kann 11 with GRU.
- kann 13: MGU (n_h_layers = 1, n_h_neurons = 32, rnn_flag = KANN_RNN_VAR_H0, norm = 0, ulen = 8, mbs = 2) + KANN_C_CEM (softmax + multi-class cross-entropy); RMSprop (lr = 0.015, decay = 0.99); x = last 10 bytes, y = last 7-10 bytes.
- nnetcpp: LSTM (nlayer = 1, nhidden= 32, learning_rate = 0.02, decay = 0.95, bias_initialized_at_one = false) + sigmoid + Dense (learning_rate = 0.02, decay = 0.95, bias_initialized_at_one = false) + sigmoid; Dense::momentum = 0.1; RMSprop; timestep = 0.
PERHAPS WRONG: MUST BE RECHECKED.
- tiny-dnn (tiny-rnn): [still to write]
PERHAPS WRONG: MUST BE RECHECKED.
Standard lstm-compress 1 lstm-compress 2 lstm-compress 3     kann 1     kann 2     kann 3     kann 4     kann 5     kann 6     kann 7     kann 8     kann 9    kann 10    kann 11    kann 12    kann 13    nnetcpp   tiny-dnn
A10.jpg         825.492         825.413         825.411         825.411    825.708    825.796    825.768    825.689    825.800    825.798    825.719    825.792    825.814    825.725    826.722    826.736    825.810    825.711    826.116 A10.jpg
AcroRd32.exe  1.489.502       1.445.631       1.487.765       1.489.617  1.424.689  1.418.343  1.461.707  1.473.547  1.355.406  1.357.539  1.454.360  1.343.308  1.344.883  1.444.427  1.330.331  1.331.602  1.344.801  1.476.933  1.465.975 AcroRd32.exe
english.dic     615.412         611.444         615.220         615.443    577.600    576.947    581.910    602.515    546.354    551.234    587.199    539.869    543.220    580.939    540.144    543.552    543.854    614.480    607.751 english.dic
FlashMX.pdf   3.702.673       3.698.941       3.702.531       3.702.542  3.699.276  3.698.209  3.700.063  3.699.717  3.696.361  3.696.373  3.698.839  3.693.607  3.693.821  3.694.846  3.695.457  3.695.655  3.693.713  3.701.418  3.701.262 FlashMX.pdf
FP.LOG          717.366         717.251         717.398         717.410    712.134    712.248    710.492    710.955    646.025    647.517    641.907    589.657    591.905    585.917    589.167    591.260    583.147    716.104    714.494 FP.LOG
MSO97.DLL     1.848.577       1.821.098       1.847.302       1.848.287  1.778.674  1.774.985  1.803.302  1.827.322  1.704.127  1.705.771  1.794.859  1.693.697  1.695.802  1.780.768  1.681.166  1.683.123  1.697.752  1.833.819  1.824.738 MSO97.DLL
ohs.doc         841.723         839.466         841.764         841.784    835.241    833.821    836.217    837.514    826.496    827.540    833.922    822.045    822.330    832.349    820.246    820.280    820.275    839.691    838.297 ohs.doc
rafale.bmp      755.150         753.311         755.359         755.335    748.249    748.315    748.333    748.631    740.970    741.098    741.577    734.511    734.465    731.064    735.216    734.945    734.724    750.809    750.378 rafale.bmp
vcfiu.hlp       714.046         711.394         714.150         714.207    697.972    695.089    695.095    703.181    666.156    667.009    696.594    630.743    634.342    681.774    627.050    630.810    624.971    708.921    708.773 vcfiu.hlp
world95.txt     635.046         634.688         635.044         635.128    627.909    627.925    627.724    631.847    620.740    621.062    624.434    616.630    617.038    615.461    616.967    617.422    616.036    633.319    630.098 world95.txt
Total        12.144.987      12.058.637      12.141.944      12.145.164 11.927.452 11.911.678 11.990.611 12.060.918 11.628.435 11.640.941 11.899.410 11.489.859 11.503.620 11.773.270 11.462.466 11.475.385 11.485.083 12.101.205 12.067.882 Total

2018/07/25-2018/10/19
- Re-analyzed (optimized) Darek's corpus/testbed with rel. 0.2.0:
0.WAV  1.381.381 -m1,3,0x7cab709d
1.BMP    303.182 -m2,3,0x7fa1745f
A.TIF    743.478 -m1,2,0x7e6de51f
B.TGA    691.121 -m1,3,0x7eed651f
C.TIF    317.774 -m1,2,0x3860e38f
D.TGA    302.753 -m2,1,0x78616b9f
E.TIF    496.019 -m1,3,0x7ba2bb5f, -m2,3,0x7ba2fb5f, -m2,3,0x7ba2fbdf
F.JPG    110.753 -m0,0,0x68e51dd0
G.EXE  1.355.544 -m1,3,0x7fa5fddf
H.EXE    459.991 -m1,3,0x7aede5df
I.EXE    216.999 -m1,2,0x7debe5df
J.EXE     43.116 -m2,2,0x7feb0d4d
L.PAK  2.745.970 -m2,1,0x7fe9759f
M.DBF     51.563 -m2,0,0x3ca06dde, -m2,3,0x3ca06cde, -m2,3,0x3ca06dde
O.APR      3.646 -m2,1,0x7ce8abde
P.FM3        879 -m2,0,0x3c68e1d7, -m2,1,0x3c68e197, -m2,1,0x3c68e1d7, -m2,2,0x3c68e197, -m2,2,0x3c68e1d7, -m2,3,0x3c68e9b7, -m2,3,0x3c68e197, -m2,3,0x3c68e1d7, -m2,3,0x3c68f1d7, -m2,3,0x3c68e1d7, -m2,3,0x3c68e097, -m2,3,0x3c68f197
Q.WK3    168.055 -m1,2,0x7cebf4df
R.DOC     28.583 -m1,1,0x3feb6d56
S.DOC     25.246 -m1,3,0x3fe37c5d
T.DOC     18.201 -m1,2,0x7fe36d95
U.DOC      8.501 -m0,0,0x7eed6d14, -m0,0,0x7eeb6d56, -m0,0,0x7ee96d56, -m1,0,0x7eed6d56
V.DOC     18.037 -m1,0,0x3fa07d94, -m1,1,0x3fa07d94, -m1,2,0x3fa07d94, -m1,3,0x3fa07d14, -m1,3,0x3fa07c94, -m1,3,0x3fa07d94
W.DOC     12.918 -m1,0,0x3fe96c14, -m1,1,0x3fe96c14, -m1,2,0x3fe96c14, -m1,3,0x3fe96c14
X.DOC     10.771 -m1,0,0x39ea6d54, -m1,0,0x39aa6d54, -m1,1,0x39ea6d54, -m1,2,0x39ea6d54, -m1,3,0x39ea6d54
Y.CFG        312 -m0,2,0x27ed7bd7, -m1,3,0x27ed7bd7, -m2,0,0x27ed7bd7, -m2,1,0x27ed7bd7, -m2,2,0x27ed7bd7, -m2,3,0x27ed7bd7, -m2,3,0x27ed5bd7, -m2,3,0x27ed7bd3, -m2,3,0x27ed7dd7, -m2,3,0x27ed7b97, -m2,3,0x27ed7b57, -m2,3,0x27ed7ad7, -m2,3,0x27ed73d7, -m2,3,0x2fed7bd7
Z.MSG        165 -m0,0,0x29ed7dd7, -m0,1,0x29ed7dd7, -m0,2,0x29ed5dd7, -m0,2,0x29ed7dd1, -m0,2,0x29ed7dd3, -m0,2,0x29ed7dd5, -m0,2,0x29ed7dd7, -m0,2,0x29ed7dc7, -m0,2,0x29ed3dd7, -m0,2,0x2bed7dd7, -m0,2,0x29ed7d57, -m0,2,0x29ed7d97, -m0,2,0x29ed7cd7, -m0,2,0x29ed75d7, -m0,3,0x29ed7dd7
Total 12.127.230

2018/05/26
- I had an idea (already known?):
- I think that a predictor is a mixer and a mixer is a predictor, they are implemented in different way to be more efficient.
- Does SSE/APM adds a layer to the predictor like in a multi layer mixer?
- If a predictor is a mixer then we can have a gated predictor like LSTM RNN, with input, forget and output gates: is it worth to try it?

2018/05/??
- lstm-compress: benchmarking 2018/05/20 version.

2018/05/??
- I understood ACB better and tried to write a good LZ compressor, but ATM it's slow and quite bad.

2018/04/??
- I'm trying a bucket version of the match model, but it does not look as good as I hoped.

2018/03-04/??
- I worked on paq8px and posted paq8px_v141.

2018/0?/??
Benchmarks for version 0.2.0, Squeeze Chart (PDF, JPG, MP3, PNG, Installer.. (Compressing Already Compressed Files)), results not verified.
0.1.1       0.1.1       0.1.1       0.2.0
-m1,3,+     -m2,0,+     -m2,0,*     -m2,0,+
Documents
5.445.616   5.452.943   5.446.604   5.453.278 busch2.epub
87.474.099  84.864.362              84.819.962 diato.sf2
38.532.255  38.534.942              38.529.249 freecol.jar
12.582.707  12.571.833              12.551.119 maxpc.pdf
85.128.562                                     SoniMusicae-Diato-sf2.zip
791.766     791.372     791.405     789.882 squeeze.xlsx
Image Formats (Camera Raw)
22.065.306  22.043.852              22.042.153 canon.cr2
11.325.973  11.109.503              11.054.666 fuji.raf
33.826.108                          33.003.219 leica.dng
32.532.433  32.532.436              32.522.671 nikon.nef
15.862.814  15.859.052  15.856.336  15.861.867 oly.orf
16.110.809                          15.663.046 pana.rw2
11.947.899  11.978.628              11.971.763 sigma.x3f
15.032.909  14.945.507  14.669.162  14.929.595 sony.arw
20.438.149  20.288.840              20.242.592 sony2.arw
Image Formats (web)
1.782.552   1.781.843   1.773.735   1.779.989 filou.gif
4.638.655   4.642.178   4.639.766   4.641.755 flumy.png
6.828.506   6.834.776   6.828.513   6.835.693 mill.jpg
Installers
101.253.297 102.250.965             102.301.073 amd.run
189.818.215 184.804.805             184.201.747 cab.tar
54.115.084  54.089.566              54.065.461 inno.exe
18.551.343  18.528.542  18.526.117  18.518.877 setup.msi
53.701.756  54.112.501              54.635.592 wise.exe
Interactive files
340.557     340.066                 338.943 flyer.msg
5.900.341   5.898.822               5.895.275 swf.tar
Scientific Data
26.924      14.915       7.167      13.525 block.hex
167.944.593 147.688.356             147.537.329 msg_lu.trace
116.023.192 110.989.967             110.913.251 num_brain.trace
36.433.304  34.696.885              34.689.966 obs_temp.trace
Songs (Tracker Modules)
6.645.290   6.501.083   6.456.087   6.463.836 it.it
15.791.191  15.354.477              15.255.383 mpt.mptm
7.428.792   7.286.070   7.180.942   7.244.534 xm.xm
Songs (web)
19.246.345  19.245.422              19.238.470 aac.aac
16.508.192  16.520.159  16.507.079  16.508.933 diatonis.wma
127.728.723 127.419.074             127.369.847 mp3corpus.tar
36.334.963  36.332.287              36.323.040 ogg.ogg
Videos (web)
15.142.319  15.139.010  15.133.624  15.134.093 a55.flv
32.153.051  32.151.082              32.149.257 h264.mkv
162.890.183 162.883.854             162.849.083 star.mov
96.324.466  96.480.593              96.361.543 van_helsing.ts
-m1,3,+       -m2,3         -mx             Squeeze Chart
68.364.894  67.828.879  65.646.788  66.860.722 squeezechart_app.tar (tarball)

2018/01/??
- lstm-compress: I did some tests changing internal parameters; the results have not been posted (update: they were posted some months later).

2017/12/??
- I tried to inherit weights in the mixer, most of the times is little bit better, but it's not worth it.

2017/12/??
- I tried SWE/AWM (secondary weight estimation/adaptive weight map) and I didn't find any gain.

>= 2017/11/15
- I stopped to work on paq8* DMC model (started at 2017/10/19) because I hadn't find any stable improvement.
bpos0
Clone_Bit_Precision_10,14,16
Clone_Condition_1..17
Clone_Round_1,2
Cnt_Max_2048,8192,16384
Dynamic_Threshold_1,1
Halve_1,2,3
Init255
Init255_Keep_C01
Init255_Keep_C01_2
Init255_Keep_C01_State
Init255_Keep_C01_State_2
Init_State0x00,0x55,0xaa,0xff
Stretch_Divisor_Pr1_2,3,5,8
Stretch_Divisor_Pr2_2,3,5,8
Stretch_Divisor_Pr_Dynamic_1,2,3

2017/11/14
- Fixed the output size displayed at the end when the output file is "." (stdout) in "c" command (until now was 0).
cmv c book1 . > nul
Wrong: In     768771 out          0 ratio 0.26297 bpc 2.1037 time 51s92 (51.92s)
^^^^^^
Right: In     768771 out     202161 ratio 0.26297 bpc 2.1037 time 51s92 (51.92s)
^^^^^^

2017/10/29
- FCM model (bit 15-16) didn't free the memory allocated by ICM, now it's fixed freeing the memory in ICM.end().
The main problem was in command "a", where memory allocated increased at every analysis (if used FCM).

2017/10/19
- I start looking at whether I can improve the paq8* DMC model, it seems that there are many things to try to change.

2017/10/14
- Released official 0.2.0 version.

2017/09/30
- Gap model 1: probably now the mixer is a bit faster.

2017/09/30
- Speeded up the initialization of the match models (about -23 sec. on my computer if all match models are enabled).

2017/09/26~
- Extreme version: added 3 more predictors (from 1 to 4) to the OIB model.

2017/09/21
- Changed "bpb" to "bpc".

2017/09/??
- Added "Data compression with release ..., method ..." in command "c" with switch -vv.
- Command "ac": fixed bug, in some cases the extended flag was not set in the compressed file and the decompression ignore the extended switches.
- Now the switch -vo no more disable the output to display, thus it outputs to file AND to display.

2017/08/16
- Fixed the last block compression ratio in the ETA display.
cmv c -m0 -vn pi1000000.txt tf1
Wrong: 977K100.0%-> 407K/7065B  407K 0.41661/0.10780 3.3328/0.8624 04s77+00s00=04s77
^^^^^^^
Right: 977K100.0%-> 407K/7065B  407K 0.41661/0.41652 3.3328/3.3322 05s04+00s00=05s04
^^^^^^^

2017/08/13
- Now it is possible to specify more than one method options, e.g. -m1,0,&b0|0x0200^0x06, -m1,0,0x03ec019f&0x05.

2017/08/06 (2017/11/12 Added -max Calgary corpus and Canterbury corpus, 2017/12/26 Added Lossless Photo Compression Benchmark)
Benchmarks for version 0.2.0 beta 1 (2017/08/06, VOM model is disabled, mod_ppmd enabled), compared to the last official (0.1.1) and previous development (2016/06/08 0.2.0 ?, 2016/11/28 >0.2.0a3 (VOM model is disabled) and 2017/05/23 >0.2.0a5 (VOM model is disabled)) versions.
The 0.2.0 beta 1 standard version has the same compression as 0.2.0 standard final version (they should be bitwise identical).
0.1.1 -mx   0.2.0 ? -mx >0.2.0 a3 -mx >0.2.0 a5 -mx  0.2.0 b1 -mx Maximum Compression
2016/01/10    2016/06/08    2016/11/28    2017/05/23    2017/08/06
820.501       819.931       819.582       819.172       819.212 A10.jpg
1.017.441       997.721       976.062       974.342       973.403 AcroRd32.exe
400.343       381.471       375.731       371.535       371.503 english.dic
3.574.241     3.566.950     3.555.244     3.554.780     3.554.682 FlashMX.pdf
280.132       267.665       258.477       256.850       256.511 FP.LOG
1.387.079     1.365.962     1.338.128     1.335.340     1.334.460 MSO97.DLL
687.731       683.352       678.767       677.244       677.186 ohs.doc
653.728       634.380       633.578       633.062       633.015 rafale.bmp
399.793       391.522       386.441       385.870       385.383 vcfiu.hlp
372.831       362.592       356.292       352.845       352.302 world95.txt
9.593.820     9.471.546     9.378.302     9.361.040     9.357.657 Total
9.632.713     9.515.604     9.386.444     9.366.438     9.363.199 MaxCompr.tar (very close to single file compression total :-))
100.031       100.031       100.031 sharnd_challenge.dat
34            34            35 Test_000
0.1.1 -mx   0.2.0 ? -mx         0.2.0a2      0.2.0a3   >0.2.0 a3 -mx  0.2.0 b1 -mx Silesia Open Source Compression Benchmark
2016/01/10    2016/06/08   2016/07/02-06   2016/09-10      2016/11/28    2017/08/06
2.047.734     2.016.906       2.015.523    2.003.326       1.999.489     1.988.153 dickens
10.117.960     9.958.936       9.942.493    9.812.955       9.800.725     9.754.826 mozilla
2.065.773     2.004.632       2.003.968    2.002.364       2.003.809     2.000.839 mr
969.798       932.177         931.890      925.225         920.860       914.998 nci
1.698.391     1.662.164       1.658.048    1.625.486       1.622.001     1.615.995 ooffice
2.081.744     2.052.039       2.052.089    2.042.336       2.042.487     2.037.774 osdb
830.294       813.002         812.739      804.606         799.896       791.623 reymont
2.799.600     2.764.836       2.762.207    2.740.579       2.725.908     2.710.815 samba
3.807.684     3.776.032       3.775.953    3.764.244       3.764.346     3.763.651 sao
5.292.616     5.184.527       5.177.512    5.126.226       5.080.963     5.020.376 webster
3.577.455     3.556.659       3.555.720    3.554.770       3.556.252     3.555.625 x-ray
281.484       276.571         276.435      275.044         272.247       270.932 xml
35.570.533    34.998.481      34.964.577   34.677.161      34.588.983    34.425.607 Total
0.1.1 -m2,3,0x03ededff    0.2.0 -mx   >0.2.0 a3 -mx    >0.2.0 a5 -mx     0.2.0 b1 -mx Large Text Compression Benchmark
2016/01/10     (>&b12)   2016/06/08      2016/11/28       2017/05/23       2017/08/06
24           24              24               24               24 ENWIK0
31           32              32               32               32 ENWIK1
91           89              88               89               88 ENWIK2
299          287             284              286              286 ENWIK3
2.997        2.953           2.922            2.925            2.927 ENWIK4
25.568       25.248          25.076           24.998           24.993 ENWIK5
214.137      211.478         209.352          208.326          208.110 ENWIK6
1.968.717    1.941.513       1.915.141        1.900.749        1.898.421 ENWIK7
18.153.319   17.898.994      17.650.885       17.325.679       17.308.303 ENWIK8 (-mx)
17.344.130 ENWIK8 (-m2,3,0x5be90df7)
16.677.314       16.630.357 ENWIK8.drt (>0.2.0 a5 -m2,3,0x53e90df7, 0.2.0 b1 -m2,3,0x5be90df7)
text8 / ENWIK8
0.1.1 -m2,3,0x03ededff    0.2.0 -mx   >0.2.0 a3 -mx    ~0.2.0 a5 -mx     0.2.0 b1 -mx Large Text Compression Benchmark
16/01/2017
18.001.449                                                     17.418.878 text8 (-mx)
0,991634037                                                    1,006388552 text8 / ENWIK8
17.478.657 text8 (-m2,3,0x5be90df7)
1,007756342 text8 / ENWIK8
139.245.172      135.668.912 ENWIK9.drt (>0.2.0 a5 -m2,3,0x53e90df7, 0.2.0 b1 -m2,3,0x5be90df7)
0.1.1 -m2,3,0x03ededff         -max                    0.2.0 b1 -mx    0.2.0 -max                  Calgary corpus (14 files)
2016/01/10     (>&b12)   2016/01/10                      2017/08/06    2017/11/02
21.365       21.328 (2,2,0x03ed6dfb)         20.656        20.539 (1,3,0x7ba16cb5) BIB
194.417      194.016 (2,3,0x03eb7df5)        190.195       189.638 (2,0,0x3fa8ade4) BOOK1
123.581      123.444 (2,3,0x03ed6df7)        120.387       120.006 (2,0,0x3fe36c27) BOOK2
43.749       43.687 (1,3,0x02abe5fd)         43.007        42.950 (1,0,0x7fa9abfe) GEO
91.994       91.914 (2,3,0x03ed6cfd)         89.360        89.095 (1,1,0x3fa36cf4) NEWS
7.671        7.655 (2,3,0x03ededfb)          7.370         7.313 (1,3,0x7ce92ca7) OBJ1
44.766       44.760 (2,3,0x03ededfd)         42.973        42.820 (1,0,0x7deb6cbd) OBJ2
13.392       13.334 (2,1,0x03ed6cf6)         13.049        12.971 (1,3,0x3be32db4) PAPER1
20.847       20.737 (1,3,0x03ed7df4)         20.428        20.293 (1,3,0x3ba86c34) PAPER2
34.149       33.870 (0,1,0x00eb6bbf)         32.917        32.488 (0,0,0x3cea623e) PIC
9.796        9.766 (2,3,0x03ed7df4)          9.562         9.509 (1,3,0x3aeb4df5) PROGC
11.105       11.084 (2,3,0x03ed7d7c)         10.802        10.750 (1,3,0x7feb4d77) PROGL
7.469        7.453 (2,3,0x03e55df8)          7.161         7.121 (1,3,0x7ae34dfb) PROGP
11.876       11.856 (2,3,0x03ed65fd)         11.508        11.456 (1,3,0x7be94cb7) TRANS
636.177      634.904                         619.375       616.949                  Total
613.782      613.758 (2,3,0x03ededbf)        596.727       596.012 (2,3,0x7fe3653f) Tarball
0,964797533  0,966694177                     0,963434107   0,966063645                  Tarball / Total
0.1.1 -m2,3,0x03ededff         -max                    0.2.0 b1 -mx    0.2.0 -max  Canterbury corpus
2016/01/10     (>&b12)   2016/01/10                      2017/08/06    2017/11/02
35.729       35.567 (2,3,0x03eb6df0)         34.910        34.666 (2,1,0x3fa06d60)  alice29.txt
33.504       33.365 (1,3,0x03ed7cf6)         32.845        32.668 (1,3,0x3ba22d70)  asyoulik.txt
6.016        5.994 (2,3,0x03ed7d74)          5.835         5.796 (0,3,0x3eeb4db0)  cp.html
2.289        2.268 (2,3,0x03ed7df4)          2.245         2.217 (1,3,0x6deb6df4)  fields.c
974          962 (2,3,0x03e57df6)            948           936 (0,3,0x6fed2df0)  grammar.lsp
9.293        9.101 (2,3,0x00a92dfc)          8.752         8.581 (2,0,0x1ca16bff)  kennedy.xls
85.378       85.130 (2,3,0x03ed6df6)         83.333        82.959 (2,0,0x3fe86d74)  lcet10.txt
123.884      123.532 (2,3,0x03a57cf2)        121.716       121.167 (2,0,0x3ba82c70)  plrabn12.txt
34.149       33.870 (0,1,0x00eb6bbf)         32.917        32.488 (0,0,0x3cea623e)  ptt5
7.629        7.614 (2,3,0x02ed6d7d)          7.412         7.383 (1,3,0x7de36dff)  sum
1.429        1.413 (2,3,0x03e57df6)          1.399         1.379 (1,3,0x6feb6df2)  xargs.1
340.274      338.816                         332.312       330.240                   Total
331.931      331.343 (2,3,0x03eb6dbf)        324.463       323.574 (2,0,0x3fe9653e)  Tarball
0,975481524  0,977943781                     0,976380630   0,979814680                   Tarball / Total
0.1.1 -m2,3,0x03ededff    0.2.0 -mx   >0.2.0 a3 -mx   >0.2.0 a5 -mx   0.2.0 b1 -mx Other
2016/01/10     (>&b12)   2016/06/08      2016/11/28      2017/05/23     2017/08/06
194.417      192.013         191.081         190.221        190.195 book1
613.782      605.705         599.057         597.065        596.727 Calgary corpus.tar
331.931      327.901         325.392         324.517        324.463 Canterbury corpus.tar
0.1.1 -mx                                >0.2.0 a5 -mx   0.2.0 b1 -mx Wratislavia XML Corpus - http://pskibinski.pl/research/Wratislavia/
2016/01/10                                   2017/05/23     2017/08/06
1.103.033                                    1.074.470      1.073.661 shakespeare.xml
56.836                                       54.461         54.408 uwm.xml
0.1.1 -m2,3,0x03ededff                >0.2.0 a3 -mx   >0.2.0 a5 -mx   0.2.0 b1 -mx 10 GB Compression Benchmark
2016/01/10     (>&b12)                   2016/11/28      2017/05/23     2017/08/06
33.024.880                   33.072.180      32.621.364     32.613.945 100mb.tar (100mb subset) (tarball)
0.1.1           0.2.0 Huge Files Compression Benchmark
1.028.583.854   1.012.368.893 vm.dll -m0,3
840.486.023     834.738.054 vm.dll -m1,3
(0.1.1?)  820.313.136     813.102.039 vm.dll -m2,3
0.1.1 -m2,3,0x03ededff    0.2.0 -mx Lossless Photo Compression Benchmark - http://imagecompression.info/gralic/LPCB.html
2016/01/10     (>&b12)   2017/11/02
11.406.685   11.123.604 canon_eos_1100d_03.ppm
9.652.792    9.202.445 fujifilm_finepix_x100_03.ppm
9.539.132    9.106.924 olympus_xz1_16.ppm
522.728      510.921 PIA12813.ppm
92.833       90.855 PIA13799.ppm
1.136.786    1.126.433 STA13456.ppm
675.198      642.012 STA13781.ppm
33.026.154   31.803.194 Total
0.1.1 -m2,3,0x03ededff    0.2.0 -mx   >0.2.0 a3 -mx   >0.2.0 a5 -mx   0.2.0 b1 -mx Compression Competition -- \$15,000 USD
2016/01/10     (>&b12)   2016/06/08      2016/11/01      2017/05/23     2017/08/06
14.897.825   14.782.006      14.737.701      14.710.428     14.709.548 SRR062634.filt.fastq
0.1.1 -m2,3,0x03ededff    0.2.0 -mx   >0.2.0 a3 -mx   >0.2.0 a5 -mx   0.2.0 b1 -mx Specific case - High redundant data in a pattern
2016/01/10     (>&b12)   2016/06/08      2016/11/28      2017/05/23     2017/08/06
1.242        1.282           1.200           1.235          1.247 NUM.txt
0.1.1 -m2,3,0x03ededff   >0.2.0 a3 -mx   0.2.0 b1 -mx Testing compressors with artificial data
2016/01/10     (>&b12)      2016/11/28     2017/08/06
1.000.048       1.000.070      1.000.066 a6000
875.600         875.617        875.616 a6001
500.888         501.254        501.278 a6004
126.131         126.717        126.683 a6007
71              95             92 b6002
71              84             85 b6004
73              75             75 b6010
148              93            101 b6100
23              23             23 c0000
68              67             67 c6000
74             118            124 c6001
101              67             67 c6255
200             162            165 d6001
80             166            173 d6016
71             105            102 d6128
179             204            204 i6002
181             212            200 i6004
265             190            193 i6010
488             421            423 i6100
247             175            178 l6002
214             212            205 l6003
211             200            186 l6004
229             192            210 l6008
387             260            230 m6002
203             240            244 m6003
339             296            304 m6004
386             383            319 m6008
500.830         500.645        500.651 p6002
501.924         500.545        500.542 p6004
503.587         500.614        500.609 p6010
515.730         511.624        511.561 p6100
500.575         500.743        500.747 r6002
250.986         250.548        250.554 r6004
101.881         100.584        100.600 r6010
13.215          11.642         11.461 r6100
500.618         500.860        500.866 s6002
250.616         250.640        250.655 s6004
101.177         100.676        100.687 s6010
11.404          10.625         10.591 s6100
5.043           5.037          5.037 w4002
50.077          50.048         50.049 w5002
500.185         500.114        500.134 w6002
100.296         100.121        100.150 w6010
10.235          10.123         10.140 w6100
5.000.429       5.000.739      5.000.711 w7002
50.076.893      96.493.563     96.348.708 w8002
62.002.677     108.407.189    108.262.066 Total
0.1.1 -m2,3,0x03ededff   PPMonstr J   >0.2.0 a3 -mx   >0.2.0 a5 -mx   0.2.0 b1 -mx  Generic Compression Benchmark
2016/01/10     (>&b12)   2006/02/16      2016/11/28      2017/05/23     2017/08/06
2.923.969    3.003.153       2.865.327       2.858.429      2.859.568  uiq2-seed'0'-n1000000
2.924.115    3.002.768       2.865.090                      2.859.158  uiq2-seed'1'-n1000000
2.924.322    3.003.251       2.865.234                      2.859.348  uiq2-seed'2'-n1000000
2.925.567    3.004.457       2.866.751                      2.860.813  uiq2-seed'3'-n1000000
2.922.792    3.001.321       2.864.111                      2.858.104  uiq2-seed'4'-n1000000
2.924.509    3.003.981       2.865.290                      2.859.345  uiq2-seed'5'-n1000000
2.924.659    3.003.513       2.865.911                      2.860.110  uiq2-seed'6'-n1000000
2.924.616    3.003.203       2.865.552                      2.859.637  uiq2-seed'7'-n1000000
2.925.579    3.004.358       2.867.011                      2.861.075  uiq2-seed'8'-n1000000
2.926.293    3.004.433       2.867.016                      2.861.078  uiq2-seed'9'-n1000000
29.246.421   30.034.438      28.657.293                     28.598.236  Total
0,97376289   1,00000000      0,95414780  0,95180932     0,95218149  Size (Ratio)
Prime Number Benchmark
msb     lsb     dbl     hex    text   twice interlv   delta bytemap  bitmap    total  tarball tarball/total (total = 4.803.388; tarball = 4.812.800, 7z.exe a -ttar)
39.547  39.314  40.403  41.851  39.019  39.569  40.831  34.251  25.789  32.110  372.684  292.304 0,78432130169 cmv -mx  0.1.1
39.379  39.181  40.011  41.629  38.829  39.400  40.548  33.999  23.851  31.659  368.486  290.754 0,78905033027 cmv -max 0.1.1
38.233  37.987  38.897  41.233  38.301  38.261  39.570  33.944  24.149  31.909  362.484  284.835 0,78578640712 cmv -mx  0.2.0 (2016/06/08)
37.963  37.679  38.711  40.893  38.131  37.995  39.136  33.925  22.864  31.895  359.192  281.555 0,78385654469 cmv -mx  >0.2.0 a3 (2016/11/28)
37.834  37.510  38.600  40.737  38.016  37.863  39.133  33.856  22.880  31.850  358.279  280.948 0,78415983075 cmv -mx  >0.2.0 a5 (2017/08/06)
37.866  37.547  38.597  40.748  38.042  37.897  39.098  33.859  22.929  31.816  358.399  280.759 0,78336993128 cmv -mx  0.2.0 b1 (2017/05/23)
37.835  37.973  38.597  41.619  38.829  37.894  36.255  33.726  23.851  29.527  356.106                        Best overall (2016/01/14)
nz      nz      nz    cmix     cmv      nz      nz    cmix     cmv    cmix
0.1.1 -mx   0.1.1 Opt.   0.2.0a3 ICM1 Opt.   0.2.0a3 Extreme Opt.   >0.2.0 a3 -mx   >0.2.0 a5 -mx   >0.2.0 a5Opt.   0.2.0 b1 -mx Darek's testbed
2016/01/10   2016/01/10          2016/09/27             2016/09/27      2016/11/01      2017/05/23      2017/05/23     2017/08/06
1.404.493    1.399.464           1.381.660              1.380.299       1.386.222       1.384.535       1.380.937      1.385.421 0.WAV
323.277      322.990             303.175                301.866         303.864         303.490         302.942        303.793 1.BMP
835.277      834.675             744.125                734.102         745.829         744.815         743.806        745.230 A.TIF
780.311      779.596             692.949                683.189         693.092         692.099         690.734        692.502 B.TGA
327.568      325.838             318.478                318.919         321.261         321.066         318.102        320.968 C.TIF
310.952      310.480             303.235                303.056         303.979         303.659         302.614        303.739 D.TGA
497.089      496.837             496.142                494.695         496.487         496.368         496.055        496.357 E.TIF
110.914      110.799             110.755                110.571         110.809         110.798         110.744        110.805 F.JPG
1.367.023    1.366.851           1.358.405              1.356.314       1.357.977       1.356.525       1.356.251      1.356.311 G.EXE
482.720      482.701             462.836                463.150         461.994         460.785         460.361        460.530 H.EXE
227.898      227.791             217.797                217.981         217.765         217.459         217.096        217.450 I.EXE
43.364       43.325              43.172                 43.179          43.173          43.178          43.151         43.154 J.EXE
2.618.200    2.616.775           2.540.011              2.536.718       2.534.630       2.531.528       2.532.372      2.530.265 K.WAD
2.830.322    2.829.223           2.751.490              2.745.121       2.751.856       2.748.195       2.746.108      2.749.234 L.PAK
55.438       55.027              52.420                 51.990          52.617          52.376          51.634         52.358 M.DBF
86.689       86.688              83.967                 83.837          83.811          83.553          83.457         83.504 N.ADX
3.777        3.775               3.672                  3.668           3.670           3.678           3.669          3.679 O.APR
947          938                 878                    879             912             909             880            913 P.FM3
187.547      187.547             170.171                168.828         169.843         169.048         168.401        168.907 Q.WK3
29.327       29.245              28.781                 28.768          28.782          28.725          28.632         28.721 R.DOC
26.020       25.970              25.447                 25.435          25.418          25.359          25.261         25.353 S.DOC
18.752       18.727              18.367                 18.367          18.285          18.274          18.232         18.253 T.DOC
8.681        8.646               8.536                  8.532           8.530           8.535           8.506          8.527 U.DOC
18.570       18.519              18.213                 18.206          18.198          18.165          18.072         18.158 V.DOC
13.228       13.185              13.007                 13.002          12.995          12.988          12.930         12.985 W.DOC
11.049       10.997              10.854                 10.857          10.863          10.848          10.792         10.849 X.DOC
323          320                 314                    314             318             318             315            317 Y.CFG
176          171                 166                    166             169             171             166            170 Z.MSG
12.619.932   12.607.100          12.159.023             12.122.009      12.163.349      12.147.447      12.132.220     12.148.453 Total
12.166.965      12.152.547                     12.153.501 Testbed.tar (quite close to single file compression total :-))
0.1.1 -m2,3,0x03ededff   >0.2.0 a5 -mx   0.2.0 b1 -mx Other 2
2016/01/10     (>&b12)      2017/05/23     2017/08/06
175.282         160.565        160.533 AIMP_free.tga
5.265.328       5.215.493      5.216.293 FFADMIN.EXE
46.949          43.339         43.227 _FOSSIL_
5.961.864                      5.876.852 human.seq.txt
0.1.1 -mx   >0.2.0 a5 -m2,0,+|0x18000000   >0.2.0 a5 -m2,0,*   0.2.0 b1 -mx Squeeze Chart (Txt Bible (Compressing Text In Different Languages))
2016/01/10                     2017/05/23          2017/05/23     2017/08/06
693.479                        679.334             671.851        671.835 afri.txt
746.900                        729.857             722.773        722.858 alb.txt
624.237                        612.327             608.960        609.200 ara.txt
705.168                        690.952             687.040        686.681 chi.txt
922.105                        903.396             891.634        893.117 cro.txt
749.296                        733.615             728.593        728.817 cze.txt
695.223                        681.642             675.363        675.552 dan.txt
729.682                        715.570             707.159        707.190 dut.txt
659.272                        646.581             638.960        639.120 eng.txt
649.197                        637.874             629.684        629.997 esp.txt
723.353                        709.038             703.339        703.426 fin.txt
692.336                        677.031             669.252        669.427 fre.txt
717.826                        702.885             696.772        696.643 ger.txt
221.765                        218.525             212.385        213.155 gre.txt
625.971                        617.692             605.519        606.438 heb.txt
782.307                        764.490             760.277        760.423 hun.txt
729.490                        715.866             707.533        707.470 ita.txt
637.865                        623.902             621.463        621.365 kor.txt
816.845                        800.804             791.440        792.855 lat.txt
676.332                        660.210             653.353        654.592 lit.txt
678.285                        666.325             656.289        656.523 mao.txt
694.682                        681.233             674.540        674.753 nor.txt
712.650                        699.289             693.137        693.274 por.txt
712.598                        697.016             690.182        690.282 rom.txt
744.556                        725.742             721.408        722.187 rus.txt
708.648                        695.153             687.979        688.398 spa.txt
758.391                        740.013             731.808        732.969 swe.txt
687.045                        671.278             663.161        664.156 tag.txt
778.152                        750.736             748.526        748.798 thai.txt
662.605                        645.354             640.555        641.864 turk.txt
712.785                        696.119             692.605        693.214 vie.txt
715.801                        699.326             692.804        692.591 xho.txt
22.364.847                     21.889.175          21.676.344     21.689.170 Total

Previous info```
I don't want to take much time to verify my english in this post, sorry if it's bad.

14. ## Thanks:

Darek (4th January 2019)

15. Here are some first scores from CMVE (x64) version for Calgary and Canterbury corpuses for -mx options.

There are two records for these testsets for almost newest versions of paq/emma/cmix - after CMV test I'll need to update other compressors but I think CMV could still be a record holder.

- for Calgary corpus geo file CMVE got 42'588 bytes
- for Canterbury corpus kennedy.xls file CMVE got 8'233 bytes

An the table in attached file.

16. ## Thanks:

Mauro Vezzosi (15th October 2017)

17. Here are scores for my testset for CMVE (x64) version compared to latest official CMV version (0.1.1) and other maximum compression programs.

In general there are huge gain - about 630KB which means CMV is another compressor which breaks the 12MB barrier for total testset. What is more impressive is the fact that CMV doesn't use models nor parsers for image, audio, or text files! Only the exe model is used! Great improvement!

This version set the absolute records for E.TIF and Q.WK3 files when the both scores are quite impressive again!
For O.APR and P.FM3 there are scores close to best cmix scores. There are also very good score for K.WAD - CMV is the second program which get below 2'500'000 bytes.

CMV 0.2.0 set also the second score for nonmodel files:
5'588'470 - cmix v13m
5'638'310 - CMVE 0.2.0
5'766'085 - paq8px v114

Great job!

18. ## Thanks:

Mauro Vezzosi (16th October 2017)

19. MaximumCompression corpus scores for CMVE (x64). It's about 80Kb less than standard CVM scores. emma and cmix scores in progress.

20. ## Thanks:

Mauro Vezzosi (17th October 2017)

21. Silesia corpus scores for CMVE 0.2.0. Quite big improvement to standard version - 564'000 bytes less!
The cmix v14a scores are not tested yet same as CMV with precompression.

22. ## Thanks:

Mauro Vezzosi (23rd October 2017)

23. Originally Posted by Darek
Silesia corpus scores for CMV 0.2.0.
I suppose you meant CMVE 0.2.0.
Are the Calgary, Canterbury and Silesia scores for -mx or optimized options?

24. Yes. It should be CMVE. Sorry.
There are mostly -mx options. I'haven't optimized options for such testsets.
I've only four files with other options than -mx:

dickens - "-m2,3,0x7fed7dfd"
webster - "-m2,3,0x7fed7dfd"
x-ray - "-m1,3,0x7fed7dbf"
xml - "-m2,3,0x7fed7dfb"

But there are not optimized options. There is a metter of time - whole silesia test for CMVE get 4-5 days to test. Maybe with faster machine optimization could be doable.
Calgary and Canterbury are posiible to optimize - I'll need some time to do it.

25. CMV scores comparison for standard x64 and extreme x64 versions for my testset.
In general there are quite big differences and amazing gain with extreme vaersion.

Especially this is visible in two files:

E.TIF - there is a 8bit image compressed by LZW. Original file is 506'390 bytes. Best compressors such as paq/cmix/emma can compress it to 496'xxx bytes due to fact thaf this file is very low redundant. And CMV standard version have similar score: 496'075 bytes, but extreme version got 493'848 - impressive score for this file.

Q.WK3 - another example of extremely good CMVE score. Other compressors got scores 161'xxx (cmix) or 180'xxx-190'xxx bytes for paq and emma. CMV got 168'171 but again CMVE crush this file to 153'016 bytes - 9% better and far away than other programs.

it looks that CMV extreme version has something extraordinary, very powerful model/setting which could set an another level of compression nonmodel files!

26. Here are scores of CMVE 0.2.0 for Silesia corpus - this time both pure and preprocessed scores.
For preprocessed scores there a nice gain for Mozilla file - CMVE score is very close to cmix v12 or v13 versions!
Also very good scores for x-ray and osdb.

preprocessing includes:
mozilla and samba files preprocessed by precomp V4.6 -cn
dickens and xml preprocessed by DRT
and ooffice preprocessed by paq

previous version CMV have better score for preprocessed webster by DRT but newest CMVE version got worse preprocessed score.

Darek

27. ## Thanks:

Mauro Vezzosi (11th November 2017)

28. Thanks you for the tests!
Originally Posted by Darek
it looks that CMV extreme version has something extraordinary, very powerful model/setting which could set an another level of compression nonmodel files!
CMVE is not so extraordinary as it seems, it has the CMV models (some of them are improved) + few simple models, more mixers in the final stage, it needs much more memory and takes a lot of time.
CMV/CMVE are essentially a bunch of sparse models + 3 levels of mixers in decreasing order of contexts complexity.
It seems that CMVE has 4 records in your testbed, but IIRC 2 of them are beaten with some old cmix version (e.g. O.APR 3608).
The chart has many column, but how about adding a new column for the minimal file size of all times (and not only of the last 2 versions)?
Code:
```Some tests with paq8pxd16+cmv.
Darek cmv  paq8pxd16+cmv -mx
1.381.820      1.336.407 0.WAV
303.238        279.608 1.BMP
744.317        488.628 A.TIF
691.395        450.295 B.TGA
2.746.339      2.715.372 L.PAK
5.867.109      5.270.310 Total```
@Darek
In "SILESIA corpus" of the sheet "CORPUSES DATABASE" there are some wrong expression, e.g. C\$30+C\$4+C\$10 instead of C\$30.

29. ## Thanks:

Darek (11th November 2017)

30. >The chart has many column, but how about adding a new column for the minimal file size of all times (and not only of the last 2 versions)?
In attached file - is it OK? I've changed also minimal file size from table scores to absolute best score - main difference is on 0.wav file when
optimfrog 4900ex crunch amazing score.

According to SILESIA corpus formulas - it's changing - these formula count me size of total SILESIA corpus size + mozilla and samba again due to precompressed files which should be added to calculate - it was for cmix file test and only to calculate remaining time to end of the test. For pure SILESIA it could be only C\$30, for cmve should be C\$30+C\$3+C\$4+C\$7+C\$10+C\$14

Thanks for testing preprocessed files by paq - I'll should test it after finishing enwik8.drt. It looks scores after parsers should be comparable to paq or cmix!

31. ## Thanks:

Mauro Vezzosi (11th November 2017)

32. enwik8.drt score for my best setting - "-m2,3,0x7fed7dfd" for CMVE:

16'424'248 bytes in 108'784,85s. Decompression not verified.

Looks promising to enwik9.drt which should be something about 133'9xx'xxx bytes. It's hard test due to time w/o computer crashes or switches off but I'm prepare to it. enwik8 was compressed in best conditions w/o breaks. Estimated time to compress enwik9.drt with partially work with lower clocks and breaks - it's about 400hours

Darek

33. enwik8 score for my best setting - "-m2,3,0x7fed7dfb" for CMVE:

17'031'015 bytes in 186'372,56s (I've tested latest paq paralelly). Decompression not verified.

Hmmm, it's much better result than I've expected - it's means that pure enwik9 should be compressed to about 140'3xx'xxx bytes which means 6'th positon on LTCB rank!

I've started to test enwik9.drt, at now it's shows 13d07h to go... Let it see. From strange reason I couldn't force CMVE to output messages to text file. After typing
CMVE c "-m2,3,0x7fed7dfd" enwik9.drt ENWIK9_DRT_best.cmv > compress_log.txt I've got CMVE c "-m2,3,0x7fed7dfd" enwik9.drt ENWIK9_DRT_best.cmv 1> compress_log.txt. Two or three spaces after .cmv. Txt file was created but communicates are displayed on the console...

34. You know what I think, you can speed it up ~2-3x with "-m2,3,0x5fed5dfd" loosing <1%.

To redirect the output use the -vo... option:
CMVE c -m2,3,0x7fed7dfd -vocompress_log.txt enwik9.drt ENWIK9_DRT_best.cmv
These commands also redirect the output, but at the end of the lines you'll find only CR, not CR+LF, thus they aren't the best solution.
CMVE c -m2,3,0x7fed7dfd -vo enwik9.drt ENWIK9_DRT_best.cmv > compress_log.txt
CMVE c -m2,3,0x7fed7dfd enwik9.drt ENWIK9_DRT_best.cmv > compress_log.txt 2>&1
Since 0.2.0, -vo redirect the output to a file *and* display the progress/ETA line, if you want to redirect to a file without displaying the progress/ETA then add "2> nul" at the end of the command line.
If you create compress_log.txt, could you give it to me when the compression ends? TIA

35. ## Thanks:

Darek (14th November 2017)

36. According to speedup - yes, I know it but "the best" means really "the best". If I could I'll try and now I could. But if this try crashes I'll probably try to use your setup.
One consequence of use "5" setup instead of "7" is this mentioned by you 1% which could means 1MB bigger file....

Thank you for the tip. I've started it again and now it works. Of course I'll give you log file.
Regards

37. Here are the scores for my maximum settings for enwik8.drt and enwik9.drt compressed by CMVE 0.2.0.

@Matt - can you add this submisstions to LTCB?

16'424'248 bytes of enwik8 preprocessed by drt - encode time: 108'784,85s, decode time: not yet, memory used: 19'600MB
129'876'858 bytes of enwik9 preprocessed by drt - encode time: 1'140'801,82s, decode time: not yet, memory used: 19'963MB - there is a 4'th place on LTCB! Great job Mauro!

option: c "-m2,3,0x7fed7dfd"

System: Core i7 4900MQ at 3.8GHz, 32GB, Win7Pro 64

Decompressor batch file size also attached in post.

Regards,
Darek

38. ## Thanks:

Mauro Vezzosi (28th November 2017)

39. Originally Posted by Darek
Here are the scores for my maximum settings for enwik8.drt and enwik9.drt compressed by CMVE 0.2.0.
Thank you very much!

> 129'876'858 bytes of enwik9 preprocessed by drt
It's better than expected.

> encode time: 1'140'801,82s
More than 13 days!

> Decompressor batch file size also attached in post.
If I'm not wrong, drt_dat is expanded to drt that is the Linux version of drt.exe, so you don't need to add it in the decode pack for Windows.

Thanks again.

40. ## Thanks:

Darek (29th November 2017)

41. >If I'm not wrong, drt_dat is expanded to drt that is the Linux version of drt.exe, so you don't need to add it in the decode pack for Windows.
If that's true then decompression batch have only about 173KB compressed by 7zip - contains CMVE and modppmd64.dll.

42. >> If I'm not wrong, drt_dat is expanded to drt that is the Linux version of drt.exe, so you don't need to add it in the decode pack for Windows.
> If that's true then decompression batch have only about 173KB compressed by 7zip - contains CMVE and modppmd64.dll.
You don't need drt_dat -> drt, but you still need drt_dic -> lpqdict0.dic and drt_exe -> DRT.exe.
7-Zip [64] 16.04:
7z a -mx Decode_pack_enwik9 CMVE.exe drt_dic drt_exe encode.bat modppmd64.dll
253.377 Decode_pack_enwik9.7z
292.037 Decode_pack_enwik9.zip

43. ## Thanks:

Darek (29th November 2017)

44. I've finished to decompress enwik9.drt by CMV extreme.... sha1 check is OK!

Here are the scores for my maximum settings for enwik8.drt and enwik9.drt compressed by CMVE 0.2.0.

@Matt - can you add this submisstions to LTCB?

16'424'248 bytes of enwik8 preprocessed by drt - encode time: 108'784,85s, decode time: 124'095,15s - sha1 checksum OK, memory used: 19'600MB
129'876'858 bytes of enwik9 preprocessed by drt - encode time: 1'140'801,82s, decode time: 1'441'561,52s - sha1 checksum OK, memory used: 19'963MB - there is a 4'th place on LTCB! Great job Mauro!

option: c "-m2,3,0x7fed7dfd"

System: Core i7 4900MQ at 3.8GHz, 32GB, Win7Pro 64

Decompressor batch file size also attached in post.

Regards,
Darek

45. ## Thanks:

Mauro Vezzosi (2nd January 2018)

46. ## Thanks (2):

Darek (5th January 2018),Mauro Vezzosi (5th January 2018)

47. @Matt - there is an error in on LTCB page in CMVE row in Totali size column:

129,876,858 - enwik9 compressed size - it's OK
307'787 - decompressor size zipped by you - it's OK
130'301'106 - Total size (sum of two rows above) - it's not OK - instead of these should be 130'184'645 - proper size of summary packed file and decompressor.

48. I am looking for if it´s possible to write GUI for CMV alongside with advanced options. I´d be very glad for that, because CMD usage is very complicated for me. I need to test your compressor with my custom dataset.
Thanks a lot for your willingness.

CompressMaster

Page 3 of 5 First 12345 Last

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•