Stephan Busch (10th July 2017)
I modified the record model (new heuristic for checking the candidate lengths, and a new context, both derived from EMMA's record model).
I've also modified the BMP parsing code to detect embedded DIB's in executable files.
Again, I just did a few tests, so I might have messed something up.
@DarekCode:Filename: winrar.exe, 1.551.248 bytes paq8px_v75 463.677 bytes paq8px_v77 462.157 bytes Filename: pic, 513.216 bytes paq8px_v75 28.953 bytes paq8px_v77 28.410 bytes
That would probably take a long time, the changes I've made are simple, just a few lines of code, so I didn't have to test on a lot of files.
I'd rather spend that time on PackRAW or EMMA, or a new project.
Last edited by mpais; 11th July 2017 at 20:56. Reason: BMP detection bug fixed, see below
Matt Mahoney (13th July 2017),Stephan Busch (11th July 2017)
Ok, I understand. I'll wait for any changes in any project as well))
Many thanks for v77!
p.s.
Some insight from bmp model. For my testbed file v77 version get worse ratio than previous varesions:
600'054 - original file 1.BMP
236'167 - paq8px_v76b
322'450 - paq8px_v77 = 37% worse...
Other files gets nice improvements.
The squeezechart app testset in a tar and all others
Last edited by Stephan Busch; 11th July 2017 at 16:46.
@Darek
Well, it seems I did break something
I'll run a few tests on some bmp files, could you share that file from your testset?
What kind of improvements did you get? These were small tweaks, I don't expect
major gains, the heuristic is designed to (try to) find the record length faster,
especially long ones, and the new context is designed to try to account for records
structures with fixed length string fields.
[EDIT]
Ok, I've fixed it and updated the attachment
@Stephan Busch
That's strange, if v76 was ok but v76b isn't, then the problem is in the 3 lines I changed
to detect more JPEGs. I compiled both versions with the same options, those documented
in the source code itself, so I don't think that is it.
I'll download some of your testsets and try it out, and I'll send you a version of v77 without
those changes to the JPEG detection, to see if you can run it.
Last edited by mpais; 11th July 2017 at 20:57.
Darek (11th July 2017)
I can't say that there no major gains. Of course for most files gain is about 01-03% but there are exceptions, especially bigger files:
K.WAD file gains about 0.5% - quite big.
L.PAK file got 1.0% of gain and then paq8px v77 take crown of best score for this file for my testbed!
L.PAK file contains a lots of WAVs inside then I'm asking about better audio tweak as it is used in emma - with this algorithm it could crunch this score even more.
M.DBF got 1.8% of gain.
In attached file you have got full testbed comparison with paq8px v75 version. difference between v75 and v76 is only JPG file = 7.3% gain.
There also compared latest best paq version - paq8kx v7, emma v23x64 and cmix v13.
In second attached file you have 1.BMP file.
Darek
mpais (11th July 2017)
Thank you Darek, the fixed version I just posted compresses your BMP file to 235.766 bytes.
I'm currently testing on Stephan's testsets (it will take a few hours), but I can't seem to replicate his problem.
Checking your results, it seems it would probably be more interesting to make a few changes to cmix.
But I'm guessing it will be even harder to change something there without breaking something else.
Well, then maybe it is actually related to the compiler used (or the options). Maybe Jan can compile it with the same configuration he used for FP8.
Did the modified JPEG detection improve results on your raw testset?
Try adding -Wl,--large-address-aware maybe?
I haven't tested on many camera raw yet but where I tested it, compression improved.
Decompression was not tested so far.
Has somebody tested with Eugene's compiler parameters?
I cannot compile myself because g++ needs zlib.h and I don't know which version.
Thanks! Due this change paq8px v77 becomes best single compressor for my whole testbed with score 11'382'643! Great job!
According to cmix - yes, this compressor have best compression ratio for nonmodel files, however except text, exe, jpg and bmp models there are no any other models then such files as tiff, wave, tga and l.pak have worse scores than other compressors. For my testbed cmix with all models could get about 975KB extra gain and score about 11'110'252 bytes.... (very close to sum of best scores for all files = 10'910'423) I don't want to force Byron to made models particulary to my testbed. But I'm waiting. Maybe in future something will change also in cmix.
PAQ variants have models for all most popular exe, text, audio and image files developed by years and by many people then despite not best overall compressio wins! hmmm this is great example of distributed teamworking!
@Shelwien, Stephan Busch
I'm not well versed in C++ development, I usually just dabble with C++ Builder, so I'll let someone more experienced compile it.
I commented the source code to try to explain what I'm doing, maybe I did something wrong.
@Darek
I've just had a quick glance at the source code for cmix, it seems the changes I made to paq8px can be used there too.
As for detecting other file types, I can make a few quick changes here and there to the parsing in paq8px, but getting it
to detect as many types as EMMA would require a lot of effort, the source code for its parsers alone is bigger (in line count)
than all the source for paq8px.
Darek (11th July 2017)
i have this MinGW compiler version installed: gcc (x86_64-posix-seh-rev1, Built by MinGW-W64 project) 4.9.2
and used this command: g++ paq8px.cpp -DWINDOWS -lz -Wall -Wextra -O3 -static -static-libgcc -opaq8px.exe
resulting binary is attached
Sure I will do it when I have some time...
Stephan Busch (11th July 2017),xinix (12th July 2017)
compiled some.
Code:// book1 compression/decompression 136.563s 137.468s: paq8px_v77 from paq8px_v77.zip 73.460s 73.836s: paq8px_v77_ic18_x32 73.492s 73.945s: paq8px_v77_gcc71_x32 64.568s 64.647s: paq8px_v77_ic18_x64 63.072s 62.868s: paq8px_v77_gcc70_x64
Bulat Ziganshin (12th July 2017),comp1 (12th July 2017),Darek (12th July 2017),mpais (12th July 2017),xinix (12th July 2017)
Speedup of x64 gcc version on my laptop is sometimes about 3 times!
I've tested SILESIA benchmark with paq8px_v77 and I've got best score for this benchmark from submitted results without using precomp! I think this score could be submitted to SILESIA benchmark - is it posiible Matt?
Due to tar files recognise and parsing v77 takes a lead. Using precomp 4.5 -cn gives only 4KB less for mozilla file, samba score is even worse.
In attached table there are comparison of SILESIA scores for emma, paq and cmix (bytes are estimated average to sum up to submitted results) with and without used precomp - cmix v13 values w/o precomp are estimated.
Darek
@Shelwien
is there a special reason why you use a pre-release of gcc 7 (x64)?
> is there a special reason why you use a pre-release of gcc 7 (x64)?
Yes. I don't know a specific reason, but my tests show that gcc70 > gcc71 > gcc63.
Probably more aggressive defaults used, or some such.
> Speedup of x64 gcc version on my laptop is sometimes about 3 times!
Nice. There's actually still PGO, AVX and large pages, so speed can be improved further.
Ok, here I added large page support:Code:67.159s 66.596s // gcc63_x32 61.777s 62.026s // gcc63_x64 66.753s 66.519s // gcc70_x32 61.526s 62.198s // gcc70_x64 67.096s 66.534s // gcc71_x32 60.996s 61.309s // gcc71_x64 59.249s 59.561s // gcc71_x64_PGO
comp1 (12th July 2017),Darek (12th July 2017),Stephan Busch (12th July 2017)
I've found a small error in the code, so I fixed it and added 2 new contexts from EMMA to the record model.
Code:Filename: sao, 7.251.944 bytes (from Silesia Corpus) paq8px_v77 3.775.675 bytes paq8px_v78 3.762.400 bytes Filename: pic, 513.216 bytes paq8px_v77 28.410 bytes paq8px_v78 27.946 bytes
comp1 (14th July 2017),Darek (13th July 2017),Mike (14th July 2017),Stephan Busch (13th July 2017)
Code:68.640s 68.001s: paq8px_v77b_gcc71_x32.exe -8 book1 67.783s 67.611s: paq8px_v77b_ic18_x32.exe -8 book1 66.035s 65.754s: paq8px_v77b_ic18_x64.exe -8 book1 65.520s 64.507s: paq8px_v77b_ic18_x64_PGO.exe -8 book1 62.353s 62.479s: paq8px_v77b_gcc71_x64.exe -8 book1 61.870s 60.949s: paq8px_v77b_gcc71_x64_PGO.exe -8 book1
One important question - is this new version number v78 or v77b?
Again, nice improvement!
Scores in JPG file.
Last edited by Darek; 14th July 2017 at 01:13.
I don't think it deserves a new version number, the changes are really small, that's why I named it v77b.
I also tried compiling cmix, so I could apply the same changes, but was unsuccessful.
When I have some time, I'll see if there are more simple things from EMMA that I can merge with paq8px.
@byronknoll
I've tried the compile options you suggest on GitHub, but I can't get it to compile on windows. Do I need a specific compiler (or compiler version)?
The changes I made are simple so I'm sure you'll be able to port them. I'd just like to try to port some of the more complex components of EMMA
to cmix (ludicrous mode especially, and the text model with the english stemmer) to see what sort of gain they'd give on a much more complex
mixing strategy than the one I use with EMMA. Those would probably be overkill for paq8px (ludicrous mode is usually a 3 to 5x slowdown) but
since with cmix you're going for maximum compression at any cost, that wouldn't be so bad.