100000000/126675.67 = 789 bytes/s. But that's still 10x faster than NNCP :)
100000000/126675.67 = 789 bytes/s. But that's still 10x faster than NNCP :)
I recommend enabling dictionary preprocessing to improve compression rate and compression time: timer cmix -c dictionary/english.dic enwik8 enwik8.cmix18
Compiling cmix yourself will also produce a faster executable. For cmix v18 on my computer, enwik8 compresses to 14838332 in 57508 seconds.
>Compiling cmix yourself will also produce a faster executable. For cmix v18 on my computer, enwik8 compresses to 14838332 in 57508 seconds.
@Byron -> how I can compile cimix by myself? Is there any fast and easy comiler to use for it? I'm a lamer in this kind of topics.
For windows you can get gcc/mingw: https://sourceforge.net/projects/min...onal%20Builds/
Then compile it like this: https://encode.su/threads/1925-cmix?...ll=1#post62052
There would be more compiler options for best speed, though.
Darek (4th January 2020)
In Linux, just run "make". In Windows it is a bit more difficult. You can try either MinGW (http://nuwen.net/mingw.html) or Cygwin (https://www.cygwin.com) and then run "make".
Darek (4th January 2020)
For "make" its better to get msys: http://www.msys2.org/ - its the only system with a working package manager.
Mingw distributions frequently don't have any make at all, or it doesn't work.
And cygwin requires manually selecting some packages in GUI setup - you have to know what you need when installing it.
So I think the best method is mingw + use a list file for g++ @list, like I suggested before. Make is not needed to build cmix.
It's an executable from your webpage. Now I'm running the ENWIK9 compression - I think it will take more than 10 days. Will test the dictiobary right after!![]()
Mingw-w64 (x86_64-8.1.0-posix-seh-rt_v6-rev0) works well for me. Just replace clang++ with g++ in the makefile and run "mingw32-make".
Shelwien (3rd January 2020)
ENWIK9 compression's running for 6 days = 34% complete. Has anyone rechecked the results ever?
![]()
Results with a slightly higher overclock:
i5-9600K @ 4.9 GHz all cores. 5.0 GHz and above requires much higher voltage which is not safe for 24/7 running.
And I don't believe your compression/decompression timings. Or it is an extremely poor EXE compile at your website - please recompile and update the page!Code:C:\cmix>timer cmix -c dictionary/english.dic enwik8 enwik8.cmix18 Timer 3.01 Copyright (c) 2002-2003 Igor Pavlov 2003-07-10 100000000 bytes -> 14847703 bytes in 76734.24 s. cross entropy: 1.188 Kernel Time = 15.625 = 00:00:15.625 = 0% User Time = 76713.015 = 21:18:33.015 = 99% Process Time = 76728.640 = 21:18:48.640 = 99% Global Time = 76736.093 = 21:18:56.093 = 100%
![]()
cmix dynamically allocates lots of small memory blocks (mostly for LSTM model),
its known to have a large effect on speed and memory consumption depending on OS and compiler.
Also there could be a difference in vector extensions used.
The executable on the cmix page was compiled without "-march=native" to improve compatibility between computers. If you compile your own executable with "-Ofast -march=native" it will be significantly faster. I think the main reason is auto-vectorization and SIMD. The cmix benchmarks I ran were not with the public executable, but with one compiled with "-march=native".
Results with a compile by Shelwien (Thank you!)
Code:C:\cmix>timer cmix -c dictionary/english.dic enwik8 enwik8.cmix18 Timer 3.01 Copyright (c) 2002-2003 Igor Pavlov 2003-07-10 100000000 bytes -> 14846066 bytes in 63749.76 s. cross entropy: 1.188 Kernel Time = 13.484 = 00:00:13.484 = 0% User Time = 63731.453 = 17:42:11.453 = 99% Process Time = 63744.937 = 17:42:24.937 = 99% Global Time = 63751.796 = 17:42:31.796 = 100%
cmix (v18 ) -c english.dic enwik9:
115,739,547 bytes, 756,510.756 sec. (8.76 days, 25GB memory use, cross entropy 0.926)
[QUOTE=Sportman;64022]cmix (v18 ) -c english.dic enwik9:
115,739,547 bytes, 756,510.756 sec. (8.76 days, 25GB memory use, cross entropy 0.926)[/Q
For enwik10 it needs 87.6 days (~3 months)![]()
@sportman If you want test the decompression function of cmix, use small file, it does not take very long time
i have tested cmix17 on wrtpre.cpp and the hash value after decompression is not match with the original file. why ???
this is the hash value of wrtpre.cpp 3FE3BD3E77A2A34869EC12FD77491EF9D0192BFA
i attach the source code and the binary of cmix17 and compiled it using dev c++
It works for me. Here is a Colab where you can see it compress+decompress to the same md5: https://colab.research.google.com/dr...eX-ZSjMgF29Ci7
I didn't use your binary. Here are some suggestions that might help:
- change the compiler flag from -Ofast to -O3. The binary will be slower, but might fix the issue you are seeing.
- upgrade your compiler to a more recent version.
- change to a different compiler - I recommend clang.