If compared to 1.2.4, GZIPHACK compiled with much newer compiler - Visual C++ 2005 SP1. However, the newest gzip 1.3.5 has a higher compression speed anyway.![]()
If compared to 1.2.4, GZIPHACK compiled with much newer compiler - Visual C++ 2005 SP1. However, the newest gzip 1.3.5 has a higher compression speed anyway.![]()
Would it be possible (I mean to do it in a few minutes/hours during afternoon, not to work on it for a week) to "hack" CABARC to use bigger dictionary than 2MB, let's say 32MB?
I think no. My guesses:
+ No source code of LZX encoder available
+ Maybe it is possible to crack the executable, but compatibility will be lost anyway
Better to use LZMA straight...
![]()
ive seen lzx-compatible compressor but not sure that it have used optimal parsingOriginally Posted by Black_Fox
for what you need it? it camt be decompressed by existing lzx tools, so lzma is anyway better
I know that from comp.compression but I still hope that you can make a hack for zlib too.Originally Posted by encode
![]()
you can get full encode decoder, speci from the following links
Cabinet File specs from Microsoft were used.
Download it in URL
http://www.speakeasy.org/~russotto/chm/
this encoder is even used to NTFS support under inux in some prgrams.. (i can't remember them now)
http://www.cabextract.org.uk/libmspack/
its a compact lib to unpack most of MS old compression algorithem
BTW i've contacted Igor pavlov years ago (i recall 2002) about LZX
implenmentation and he told me that then (BIX) archiver was based on similar algorithem but with higher dictionary (actually 4MB) and has extra x86 filter,.. so here is full info to whome concernd..
sorry here is the bix download page
http://coding.wooyayhoopla.be/lcab/files/cab-sdk.z ip
bix download..
http://www.7-zip.org/igor.html
ive tried hacking cabarc.exe . ive relaxed limits of lzx compression levels (appropriate instruction to change is at offset 40210 in executable) to 12 .. 27 instead of default 15 .. 21, but when i select values outside of default range it, internal fci library throws error 8 at adding file and program crashes. exception is on level 22 when it doesnt print error message but it crashes anyway.Originally Posted by Black_Fox
i think that errors are caused by limitations of file format. i guess that cab uses 3 bits to describe compression level used (3 bits = 8 possible values, by default lzx uses only seven compression levels, so i think it causes that level 22 doesnt generate error on fciaddfile() ). additionally lzx uses fixed amount of position slots, so it would crash when it find match which offset wont fit in any position slot (i guess that this causes crash when selecting level 22).
so hacking cabarc to support larger dictionaries would require major changes to both file format and executable itself. but maybe ill hack it someday - itd funny to do
btw:
on textual files cabarc and 7- zip produces almost identically sized archives (for same dictionary size), so lzma can be some indication.
Ive been curious how ratio and speeds would changeOriginally Posted by Bulat Ziganshin
Thanks for all the links! Ill see what BIX can do...Originally Posted by maadjordan
That seems like too much work, unless a lot more people were interested in its outcomeOriginally Posted by donkey7
![]()
LZH is sort of LZX some how... squeeze compression is based on LZH algorithem but with filters and extra dictionaries..
i've tested all of these (bix,sqz,lh7,..) but with no luck.. LZX good but slow.. if you increase the dictionary size it'd become slower.. unless the process is multi-threaded ...
I hacked zlib then. But it seems not working (CRC error)
/* If prev_match is also MIN_MATCH, match_start is garbage
* but we will ignore the current match anyway.
*/
s->match_length = MIN_MATCH-1;
}
}
+ /* a small HACK written in 5 minutes
+ TODO: write more clever implementation
+ */
+ if (s->match_length >= MIN_MATCH && s->level == 9) {
+ int tmp_strstart = s->strstart; /* store global variables */
+ int tmp_match_start = s->match_start;
+ int dist = s->strstart-s->match_start;
+ int next_len;
+ int next_dist;
+ unsigned hash = s->ins_h;
+ int i;
+
+ /* lazy matching with 2 byte lookahead */
+ for (i = 0; i < 2; i++) {
+ UPDATE_HASH(s, s->ins_h, s->window[(++s->strstart) + MIN_MATCH-1]);
+
+ next_len = longest_match (s, hash_head); /* get match length and distance */
+ next_dist = s->strstart-s->match_start;
+
+ /* check for a better match,
+ also check the distance of the followed match */
+ if ((next_len > ((s->match_length + 1) + i))
+ || ((next_len > (s->match_length + i)) && ((next_dist >> 3) < dist))) {
+
+ s->match_length = 0; /* discard current match */
+ break;
+ }
+ }
+
+ s->strstart = tmp_strstart; /* restore values */
+ s->match_start = tmp_match_start;
+ }
+ /* End of hack?
+ */
+
/* If there was a match at the previous step and the current
* match is not better, output the previous match:
*/
if (s->prev_length >= MIN_MATCH && s->match_length <= s->prev_length) {
uInt max_insert = s->strstart + s->lookahead - MIN_MATCH;
I just patch the deflate_fast(), inserting one extra check, kind of. So, string checking must be placed between the main string search and the unit output:
Yes, end of hack is when we restore the global variables.if (hash_head != NIL && s->strstart - hash_head <= MAX_DIST(s)) {
/* To simplify the code, we prevent matches with the string
* of window index 0 (in particular we have to avoid a match
* of the string with itself at the start of the input file).
*/
#ifdef FASTEST
if ((s->strategy != Z_HUFFMAN_ONLY && s->strategy != Z_RLE) ||
(s->strategy == Z_RLE && s->strstart - hash_head == 1)) {
s->match_length = longest_match_fast (s, hash_head);
}
#else
if (s->strategy != Z_HUFFMAN_ONLY && s->strategy != Z_RLE) {
s->match_length = longest_match (s, hash_head);
} else if (s->strategy == Z_RLE && s->strstart - hash_head == 1) {
s->match_length = longest_match_fast (s, hash_head);
}
#endif
/* longest_match() or longest_match_fast() sets match_start */
}
HERE:
if (s->match_length >= MIN_MATCH) {
TODO: Place code here
}
if (s->match_length >= MIN_MATCH) {
check_match(s, s->strstart, s->match_start, s->match_length);
_tr_tally_dist(s, s->strstart - s->match_start,
s->match_length - MIN_MATCH, bflush);
...
The code from GZIPHACK with some changes should work. Check for hash calculation. With original ZLIB/GZIP hash value updates sequentially - but we need a random access to the hash value - i.e. we must know the real hash at any position (i+1, i+2). We may not change the global hash value. With my implementation I just keep an original hash (+ global variables, which needed for longest_match()) and I directly modify this hash for the next 1, 2 bytes.
As I can see:
You modify the deflate with lazy matches. You shouldnt. Modify the fast version of deflate, since such function finds and outputs matches straight!Originally Posted by roytam1
Good luck!![]()
OK, it works now.Originally Posted by encode
test data: world95.txt (text-test.rar from MC)
26/08/2007 09:50 863,370 world95_9.txt.gz
26/08/2007 10:14 857,939 world95_lc2_9.txt.gz
test data: fp.log (log-test.rar from MC)
26/08/2007 10:17 1,333,125 fp_9.log.gz
26/08/2007 10:15 1,313,936 fp_lc2_9.log.gz
Timer data:
using deflate_slow:
F:jatfzlib123>timer minigzip -9 fp.log
Timer 3.01 Copyright (c) 2002-2003 Igor Pavlov 2003-07-10
Kernel Time = 0.046 = 00:00:00.046 = 3%
User Time = 1.093 = 00:00:01.093 = 87%
Process Time = 1.140 = 00:00:01.140 = 91%
Global Time = 1.250 = 00:00:01.250 = 100%
using deflate_fast:
F:jatfzlib123>timer minigzip -9 fp.log
Timer 3.01 Copyright (c) 2002-2003 Igor Pavlov 2003-07-10
Kernel Time = 0.015 = 00:00:00.015 = 0%
User Time = 1.765 = 00:00:01.765 = 97%
Process Time = 1.781 = 00:00:01.781 = 98%
Global Time = 1.812 = 00:00:01.812 = 100%
P.S.: miniBB doesn't process backslashes correctly. It eats all backslashes now.
patch is here:
http://three.fsphost.com/rtfreesoft/zlib_deflate_l c2.patch
I recompiled OptiPNG with hacked zlib and I tried to optimize a 3458*5000 (600dpi) 24bpp png image which is optimized by pngout /y /b0 before.
Here is the result:
OptiPNG 0.5.5: Advanced PNG optimizer.
Copyright (C) 2001-2007 Cosmin Truta.
** Processing: I:\01.png
3458x5000 8-bit RGB non-interlaced
Input IDAT size = 16322560 bytes
Input file size = 16325347 bytes
Trying...
zc = 9 zm = 9 zs = 0 f = 0 IDAT size = 34730311
zc = 9 zm = 9 zs = 1 f = 0 IDAT size = 34730311
zc = 9 zm = 9 zs = 3 f = 0 IDAT too big
zc = 9 zm = 9 zs = 0 f = 5 IDAT size = 17515643
zc = 9 zm = 9 zs = 1 f = 5 IDAT size = 17515643
zc = 9 zm = 9 zs = 3 f = 5 IDAT size = 16150950
Selecting parameters:
zc = 9 zm = 9 zs = 3 f = 5 IDAT size = 16150950
Output IDAT size = 16150950 bytes (171610 bytes decrease)
Output file size = 16153737 bytes (171610 bytes = 1.05% decrease)
Awesome!!![]()
wt?where i can download this Optipng?
http://three.fsphost.com/rtfreesoft/optipng.7zOriginally Posted by John
One an additional thing you can make:
change to:/* 4 */ {4, 4, 16, 16, deflate_slow}, /* lazy matches */
/* 5 */ {8, 16, 32, 32, deflate_slow},
/* 6 */ {8, 16, 128, 128, deflate_slow},
/* 7 */ {8, 32, 128, 256, deflate_slow},
/* 8 */ {32, 128, 258, 1024, deflate_slow},
-/* 9 */ {32, 258, 258, 4096, deflate_slow}}; /* max compression */
+/* 9 */ {32, 258, 258, 4096, deflate_fast}}; /* max compression */
Also you can separate "deflate_fast()", adding "deflate_max()"./* 4 */ {4, 4, 16, 16, deflate_slow}, /* lazy matches */
/* 5 */ {8, 16, 32, 32, deflate_slow},
/* 6 */ {8, 16, 128, 128, deflate_slow},
/* 7 */ {8, 32, 128, 256, deflate_slow},
/* 8 */ {32, 128, 258, 1024, deflate_slow},
-/* 9 */ {32, 258, 258, 4096, deflate_slow}}; /* max compression */
+/* 9 */ {258, 258, 258, 8192, deflate_fast}}; /* max compression */
recompress by pngout again:
In:16153737 bytes I:\01.png /c2 /f5
Out:16506083 bytes I:\01.png /c2 /f5
Unable to compress further
recompress by advpng:
I:\>advpng -z -4 01.png
16153737 16153737 100% 01.png (Bigger 3321283
16153737 16153737 100%
recompress by advdef:
I:\>advdef -z -4 01.png
16153737 16067648 99% 01.png
16153737 16067648 99%
I know that but I want less code duplication in the library.Originally Posted by encode
And I hope you can find out a faster implementation of that because it is d**n slow.![]()
Authors of ZLIB/GZIP to speed up the compression, instead of an extra match search, just delay the match output, and decide only at next step. (Which Lazy matching, or Lazy evaluation comes from) Maybe it is possible to do the same thing with 2-byte lookahead. I don't like such approach anyway, because it's not clear and secondly for extra match searches we can use a simplified version of "longest_match()". For example, we can start searching for 4-byte string, also we can stop the search if a longer match found - instead of searching the longest match possible.
Also with Deflate, I think it is better to use multiplicative hash instead of this XOR crap...
![]()
the deflate base in advpng and advdef is based on old slow 7-zip branch.. its not updated to the latest version..
7-zip version
4.44 has speed optimization on deflate
4.43 zip is multithreded
4.42 has improved zip/deflate/gzip compression ratio in ultra mode..
so i think if advance comp package is recompiled with latest 7-zip deflate engine it would have more size reduction..even if opting has option to enocde with 7-zip deflate engine as alternative to zlib...
but still there is deflopt which still can squeeze more..
It sounds like that DeflOpt optimizes the Huffman part only, just like what jpegtran -optimize does.Originally Posted by encode
if all these proggies be joined in one program .. i think it can nock the bell..
and the advance-comp package added the "mng,jng" support which is new..
People will be happy if you can do this for them. :-]Originally Posted by maadjordan
i tried the optipng_sse2.exe with a 1mb png
Originally it's:
1.312.296
With optipng_sse2.exe:
1.300.953
And with pngout it's:
1.268.870
1MB is not big enough to beat pngout.Originally Posted by John
have you tested advdef or advpng with -z4 key on original then deflopt the result
http://www.encode.su/forums/index.php?action=vthre ad&forum=1&topic=499&page=1#msg5758Originally Posted by maadjordan
deflopt usually can draw out some bytes. It shrinks that file in less than 20 bytes IIRC.