I've found a parameter that can be tweaked to slightly modify Zopfli bevaviour, in lz77.c:
Code:
/*
Gets the value of the length given the distance. Typically, the value of the
length is the length, but if the distance is very long, decrease the value of
the length a bit to make up for the fact that long distances use large amounts
of extra bits.
*/
static int GetLengthValue(int length, int distance) {
/*
At distance > 1024, using length 3 is no longer good, due to the large amount
of extra bits for the distance code. distance > 1024 uses 9+ extra bits, and
this seems to be the sweet spot.
*/
return distance > 1024 ? length - 1 : length;
}
Replacing 1024 in the line "return distance > 1024 ? length - 1 : length;" by 512, 768, 1536, 2048, 3072 or 4096 will produce a different file (even block splitting is affected).
For instance on the file deflate.c it gives this (size in bytes):
I don't know against what they trained Zopfli to pick 1024, but apparently it was not these two files.
This is pretty similar to the TOO_FAR story in Zlib: http://optipng.sourceforge.net/pngtech/too_far.html
Oh, I forgot:
Code:
huffmix -v book1-zop4096-i100d.gz book1-zop4096d.gz book1-zop4096-mix.gz
book1-zop4096-i100.gz (298746 bytes)
Block boundaries: 0,9a3,415f (3 blocks)
book1-zop4096.gz (298751 bytes)
Block boundaries: 0,9a3,415f (3 blocks)
File Type C-Offset C-Length U-Offset U-Length
A 2 0 10608 0 2467
A 2 10608 49202 9a3 14268
A 2 59810 2330011 415f 752036
File Type C-Offset C-Length U-Offset U-Length
B 2 0 10604 0 2467
B 2 10604 49253 9a3 14268
B 2 59857 2330005 415f 752036
File C-Offset C-Length
B 0 10604
A 10608 49202
B 59857 2330005
Saved 10 bits, output file size 298745 bytes
And it strikes me right now: why do they remove systematically one and not only if length is small (3 or 4)?! Could be a bug I'll check it tomorrow.
Last edited by caveman; 6th March 2013 at 04:45.
Reason: spell checking
Well I know for sure there is a win32 compile of pigz, I downloaded it awhile ago. Hopefully that same person can make do a new win32 compile of the new Zopfli code.
Does this mean that we can use our 8-core processors to speed up Zopfli implementation with pigz by means of parallel processing? Further, since pigz will support .ZIP archive format, it would seem we could then create .ZIP archives with multiple files as well...
I've just made a win32 build of pigz 2.3, compiled with mingw gcc 4.7.2, pthreads 2.9.0 and zlib 1.2.7
I've included a patch file of the changes made, but I'll summarise the important one here:
- MinGW doesn't seem to support symlinks, at least with sys/stat.h Windows symlinks should still work, but this build shouldn't handle them differently to ordinary files (ie it won't skip symlinks)
Cygwin may or may not do things better - I don't know as I don't have it installed.
Hope that helps!
Last edited by DotDotDot; 21st March 2013 at 15:27.
Download pthreads-win32 tarball where you previously have downloaded and uncompressed pigz tarball, uncompress it and compile it with:
$ make clean GC-static
Then you have created the libpthreadGC2.a archive.
Thanks again for the tip AiZ!
I didn't grab the source code for pthread, just used mingw get, which doesn't seem to have a static version. The DLL doesn't really bother me personally, but if anyone does really want a fully static build, your instructions will be invaluable
I'll keep that in mind for any future builds - thanks again!
Zopfli has the function OptimizeHuffmanForRle which is used to improve the compression of huffman trees. Brotli has an improved, but compatible version.
How to copy this function in zopfli?
I came up with that hack and wrote both of these functions originally. Last week, I played with this for about 3 hours (trying to fit the function from brotli to zopfli), and couldn't get savings in my benchmarks. Sometimes it is smaller, sometimes larger. The coding of Huffman codes is different in brotli and deflate, so the manipulation of the rle coding reflects this, and this is the main reason why brotli's code is different from zopfli.
Unfortunately, I can not carry out the test, because I can not apply the patch. if somebody applies given the patch and will send files, I am happy to do the tests