Page 2 of 3 FirstFirst 123 LastLast
Results 31 to 60 of 64

Thread: gzip-1.2.4-hack - a hacked version of gzip

  1. #31
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,002
    Thanks
    391
    Thanked 377 Times in 147 Posts
    If compared to 1.2.4, GZIPHACK compiled with much newer compiler - Visual C++ 2005 SP1. However, the newest gzip 1.3.5 has a higher compression speed anyway.

  2. #32
    Member
    Join Date
    Dec 2006
    Posts
    611
    Thanks
    0
    Thanked 1 Time in 1 Post
    Would it be possible (I mean to do it in a few minutes/hours during afternoon, not to work on it for a week) to "hack" CABARC to use bigger dictionary than 2MB, let's say 32MB?

  3. #33
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,002
    Thanks
    391
    Thanked 377 Times in 147 Posts
    I think no. My guesses:
    + No source code of LZX encoder available
    + Maybe it is possible to crack the executable, but compatibility will be lost anyway

    Better to use LZMA straight...


  4. #34
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,537
    Thanks
    758
    Thanked 676 Times in 366 Posts
    Quote Originally Posted by Black_Fox
    Would it be possible (I mean to do it in a few minutes/hours during afternoon, not to work on it for a week) to "hack" CABARC to use bigger dictionary than 2MB, lets say 32MB?
    ive seen lzx-compatible compressor but not sure that it have used optimal parsing

    for what you need it? it camt be decompressed by existing lzx tools, so lzma is anyway better

  5. #35
    Member
    Join Date
    May 2008
    Location
    HK
    Posts
    160
    Thanks
    4
    Thanked 25 Times in 15 Posts
    Quote Originally Posted by encode
    Note that authors of GZIP/ZLIB was informed about such improvement
    I know that from comp.compression but I still hope that you can make a hack for zlib too.

  6. #36
    Member
    Join Date
    May 2008
    Location
    Kuwait
    Posts
    341
    Thanks
    36
    Thanked 37 Times in 22 Posts
    you can get full encode decoder, speci from the following links

    Cabinet File specs from Microsoft were used.
    Download it in URL

    http://www.speakeasy.org/~russotto/chm/

    this encoder is even used to NTFS support under inux in some prgrams.. (i can't remember them now)

    http://www.cabextract.org.uk/libmspack/
    its a compact lib to unpack most of MS old compression algorithem

    BTW i've contacted Igor pavlov years ago (i recall 2002) about LZX
    implenmentation and he told me that then (BIX) archiver was based on similar algorithem but with higher dictionary (actually 4MB) and has extra x86 filter,.. so here is full info to whome concernd..

  7. #37
    Member
    Join Date
    May 2008
    Location
    Kuwait
    Posts
    341
    Thanks
    36
    Thanked 37 Times in 22 Posts

  8. #38
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,487
    Thanks
    26
    Thanked 129 Times in 99 Posts
    Quote Originally Posted by Black_Fox
    Would it be possible (I mean to do it in a few minutes/hours during afternoon, not to work on it for a week) to "hack" CABARC to use bigger dictionary than 2MB, lets say 32MB?
    ive tried hacking cabarc.exe . ive relaxed limits of lzx compression levels (appropriate instruction to change is at offset 40210 in executable) to 12 .. 27 instead of default 15 .. 21, but when i select values outside of default range it, internal fci library throws error 8 at adding file and program crashes. exception is on level 22 when it doesnt print error message but it crashes anyway.

    i think that errors are caused by limitations of file format. i guess that cab uses 3 bits to describe compression level used (3 bits = 8 possible values, by default lzx uses only seven compression levels, so i think it causes that level 22 doesnt generate error on fciaddfile() ). additionally lzx uses fixed amount of position slots, so it would crash when it find match which offset wont fit in any position slot (i guess that this causes crash when selecting level 22).

    so hacking cabarc to support larger dictionaries would require major changes to both file format and executable itself. but maybe ill hack it someday - itd funny to do

    btw:
    on textual files cabarc and 7- zip produces almost identically sized archives (for same dictionary size), so lzma can be some indication.

  9. #39
    Member
    Join Date
    Dec 2006
    Posts
    611
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by Bulat Ziganshin
    for what you need it?
    Ive been curious how ratio and speeds would change
    Quote Originally Posted by maadjordan
    bix download..
    Thanks for all the links! Ill see what BIX can do...
    Quote Originally Posted by donkey7
    so hacking cabarc to support larger dictionaries would require major changes to both file format and executable itself
    That seems like too much work, unless a lot more people were interested in its outcome

  10. #40
    Member
    Join Date
    May 2008
    Location
    Kuwait
    Posts
    341
    Thanks
    36
    Thanked 37 Times in 22 Posts
    LZH is sort of LZX some how... squeeze compression is based on LZH algorithem but with filters and extra dictionaries..

    i've tested all of these (bix,sqz,lh7,..) but with no luck.. LZX good but slow.. if you increase the dictionary size it'd become slower.. unless the process is multi-threaded ...

  11. #41
    Member
    Join Date
    May 2008
    Location
    HK
    Posts
    160
    Thanks
    4
    Thanked 25 Times in 15 Posts
    I hacked zlib then. But it seems not working (CRC error)
    /* If prev_match is also MIN_MATCH, match_start is garbage
    * but we will ignore the current match anyway.
    */
    s->match_length = MIN_MATCH-1;
    }
    }

    + /* a small HACK written in 5 minutes
    + TODO: write more clever implementation
    + */
    + if (s->match_length >= MIN_MATCH && s->level == 9) {
    + int tmp_strstart = s->strstart; /* store global variables */
    + int tmp_match_start = s->match_start;
    + int dist = s->strstart-s->match_start;
    + int next_len;
    + int next_dist;
    + unsigned hash = s->ins_h;
    + int i;
    +
    + /* lazy matching with 2 byte lookahead */
    + for (i = 0; i < 2; i++) {
    + UPDATE_HASH(s, s->ins_h, s->window[(++s->strstart) + MIN_MATCH-1]);
    +
    + next_len = longest_match (s, hash_head); /* get match length and distance */
    + next_dist = s->strstart-s->match_start;
    +
    + /* check for a better match,
    + also check the distance of the followed match */
    + if ((next_len > ((s->match_length + 1) + i))
    + || ((next_len > (s->match_length + i)) && ((next_dist >> 3) < dist))) {
    +
    + s->match_length = 0; /* discard current match */
    + break;
    + }
    + }
    +
    + s->strstart = tmp_strstart; /* restore values */
    + s->match_start = tmp_match_start;
    + }
    + /* End of hack?
    + */
    +
    /* If there was a match at the previous step and the current
    * match is not better, output the previous match:
    */
    if (s->prev_length >= MIN_MATCH && s->match_length <= s->prev_length) {
    uInt max_insert = s->strstart + s->lookahead - MIN_MATCH;

  12. #42
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,002
    Thanks
    391
    Thanked 377 Times in 147 Posts
    I just patch the deflate_fast(), inserting one extra check, kind of. So, string checking must be placed between the main string search and the unit output:
    if (hash_head != NIL && s->strstart - hash_head <= MAX_DIST(s)) {
    /* To simplify the code, we prevent matches with the string
    * of window index 0 (in particular we have to avoid a match
    * of the string with itself at the start of the input file).
    */
    #ifdef FASTEST
    if ((s->strategy != Z_HUFFMAN_ONLY && s->strategy != Z_RLE) ||
    (s->strategy == Z_RLE && s->strstart - hash_head == 1)) {
    s->match_length = longest_match_fast (s, hash_head);
    }
    #else
    if (s->strategy != Z_HUFFMAN_ONLY && s->strategy != Z_RLE) {
    s->match_length = longest_match (s, hash_head);
    } else if (s->strategy == Z_RLE && s->strstart - hash_head == 1) {
    s->match_length = longest_match_fast (s, hash_head);
    }
    #endif
    /* longest_match() or longest_match_fast() sets match_start */
    }

    HERE:
    if (s->match_length >= MIN_MATCH) {

    TODO: Place code here

    }

    if (s->match_length >= MIN_MATCH) {
    check_match(s, s->strstart, s->match_start, s->match_length);

    _tr_tally_dist(s, s->strstart - s->match_start,
    s->match_length - MIN_MATCH, bflush);
    ...
    Yes, end of hack is when we restore the global variables.

    The code from GZIPHACK with some changes should work. Check for hash calculation. With original ZLIB/GZIP hash value updates sequentially - but we need a random access to the hash value - i.e. we must know the real hash at any position (i+1, i+2). We may not change the global hash value. With my implementation I just keep an original hash (+ global variables, which needed for longest_match()) and I directly modify this hash for the next 1, 2 bytes.

    As I can see:
    Quote Originally Posted by roytam1
    /* If there was a match at the previous step and the current
    * match is not better, output the previous match:
    */
    You modify the deflate with lazy matches. You shouldnt. Modify the fast version of deflate, since such function finds and outputs matches straight!

    Good luck!

  13. #43
    Member
    Join Date
    May 2008
    Location
    HK
    Posts
    160
    Thanks
    4
    Thanked 25 Times in 15 Posts
    Quote Originally Posted by encode
    Modify the fast version of deflate, since such function finds and outputs matches straight!
    OK, it works now.

    test data: world95.txt (text-test.rar from MC)

    26/08/2007 09:50 863,370 world95_9.txt.gz
    26/08/2007 10:14 857,939 world95_lc2_9.txt.gz

    test data: fp.log (log-test.rar from MC)

    26/08/2007 10:17 1,333,125 fp_9.log.gz
    26/08/2007 10:15 1,313,936 fp_lc2_9.log.gz

  14. #44
    Member
    Join Date
    May 2008
    Location
    HK
    Posts
    160
    Thanks
    4
    Thanked 25 Times in 15 Posts
    Timer data:

    using deflate_slow:
    F:jatfzlib123>timer minigzip -9 fp.log

    Timer 3.01 Copyright (c) 2002-2003 Igor Pavlov 2003-07-10

    Kernel Time = 0.046 = 00:00:00.046 = 3%
    User Time = 1.093 = 00:00:01.093 = 87%
    Process Time = 1.140 = 00:00:01.140 = 91%
    Global Time = 1.250 = 00:00:01.250 = 100%

    using deflate_fast:
    F:jatfzlib123>timer minigzip -9 fp.log

    Timer 3.01 Copyright (c) 2002-2003 Igor Pavlov 2003-07-10

    Kernel Time = 0.015 = 00:00:00.015 = 0%
    User Time = 1.765 = 00:00:01.765 = 97%
    Process Time = 1.781 = 00:00:01.781 = 98%
    Global Time = 1.812 = 00:00:01.812 = 100%

    P.S.: miniBB doesn't process backslashes correctly. It eats all backslashes now.

  15. #45
    Member
    Join Date
    May 2008
    Location
    HK
    Posts
    160
    Thanks
    4
    Thanked 25 Times in 15 Posts

  16. #46
    Member
    Join Date
    May 2008
    Location
    HK
    Posts
    160
    Thanks
    4
    Thanked 25 Times in 15 Posts
    I recompiled OptiPNG with hacked zlib and I tried to optimize a 3458*5000 (600dpi) 24bpp png image which is optimized by pngout /y /b0 before.

    Here is the result:

    OptiPNG 0.5.5: Advanced PNG optimizer.
    Copyright (C) 2001-2007 Cosmin Truta.

    ** Processing: I:&#92;01.png
    3458x5000 8-bit RGB non-interlaced
    Input IDAT size = 16322560 bytes
    Input file size = 16325347 bytes
    Trying...
    zc = 9 zm = 9 zs = 0 f = 0 IDAT size = 34730311
    zc = 9 zm = 9 zs = 1 f = 0 IDAT size = 34730311
    zc = 9 zm = 9 zs = 3 f = 0 IDAT too big
    zc = 9 zm = 9 zs = 0 f = 5 IDAT size = 17515643
    zc = 9 zm = 9 zs = 1 f = 5 IDAT size = 17515643
    zc = 9 zm = 9 zs = 3 f = 5 IDAT size = 16150950

    Selecting parameters:
    zc = 9 zm = 9 zs = 3 f = 5 IDAT size = 16150950

    Output IDAT size = 16150950 bytes (171610 bytes decrease)
    Output file size = 16153737 bytes (171610 bytes = 1.05% decrease)

    Awesome!!

  17. #47
    Member
    Join Date
    Jul 2008
    Posts
    54
    Thanks
    0
    Thanked 0 Times in 0 Posts
    wt? where i can download this Optipng?

  18. #48
    Member
    Join Date
    May 2008
    Location
    HK
    Posts
    160
    Thanks
    4
    Thanked 25 Times in 15 Posts
    Quote Originally Posted by John
    where i can download this Optipng?
    http://three.fsphost.com/rtfreesoft/optipng.7z

  19. #49
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,002
    Thanks
    391
    Thanked 377 Times in 147 Posts
    One an additional thing you can make:
    /* 4 */ {4, 4, 16, 16, deflate_slow}, /* lazy matches */
    /* 5 */ {8, 16, 32, 32, deflate_slow},
    /* 6 */ {8, 16, 128, 128, deflate_slow},
    /* 7 */ {8, 32, 128, 256, deflate_slow},
    /* 8 */ {32, 128, 258, 1024, deflate_slow},
    -/* 9 */ {32, 258, 258, 4096, deflate_slow}}; /* max compression */
    +/* 9 */ {32, 258, 258, 4096, deflate_fast}}; /* max compression */
    change to:
    /* 4 */ {4, 4, 16, 16, deflate_slow}, /* lazy matches */
    /* 5 */ {8, 16, 32, 32, deflate_slow},
    /* 6 */ {8, 16, 128, 128, deflate_slow},
    /* 7 */ {8, 32, 128, 256, deflate_slow},
    /* 8 */ {32, 128, 258, 1024, deflate_slow},
    -/* 9 */ {32, 258, 258, 4096, deflate_slow}}; /* max compression */
    +/* 9 */ {258, 258, 258, 8192, deflate_fast}}; /* max compression */
    Also you can separate "deflate_fast()", adding "deflate_max()".

  20. #50
    Member
    Join Date
    May 2008
    Location
    HK
    Posts
    160
    Thanks
    4
    Thanked 25 Times in 15 Posts
    recompress by pngout again:
    In:16153737 bytes I:&#92;01.png /c2 /f5
    Out:16506083 bytes I:&#92;01.png /c2 /f5
    Unable to compress further

    recompress by advpng:
    I:&#92;>advpng -z -4 01.png
    16153737 16153737 100% 01.png (Bigger 3321283
    16153737 16153737 100%

    recompress by advdef:
    I:&#92;>advdef -z -4 01.png
    16153737 16067648 99% 01.png
    16153737 16067648 99%

  21. #51
    Member
    Join Date
    May 2008
    Location
    HK
    Posts
    160
    Thanks
    4
    Thanked 25 Times in 15 Posts
    Quote Originally Posted by encode
    Also you can separate "deflate_fast()", adding "deflate_max()".
    I know that but I want less code duplication in the library.
    And I hope you can find out a faster implementation of that because it is d**n slow.

  22. #52
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,002
    Thanks
    391
    Thanked 377 Times in 147 Posts
    Authors of ZLIB/GZIP to speed up the compression, instead of an extra match search, just delay the match output, and decide only at next step. (Which Lazy matching, or Lazy evaluation comes from) Maybe it is possible to do the same thing with 2-byte lookahead. I don't like such approach anyway, because it's not clear and secondly for extra match searches we can use a simplified version of "longest_match()". For example, we can start searching for 4-byte string, also we can stop the search if a longer match found - instead of searching the longest match possible.

    Also with Deflate, I think it is better to use multiplicative hash instead of this XOR crap...


  23. #53
    Member
    Join Date
    May 2008
    Location
    Kuwait
    Posts
    341
    Thanks
    36
    Thanked 37 Times in 22 Posts
    the deflate base in advpng and advdef is based on old slow 7-zip branch.. its not updated to the latest version..

    7-zip version
    4.44 has speed optimization on deflate
    4.43 zip is multithreded
    4.42 has improved zip/deflate/gzip compression ratio in ultra mode..

    so i think if advance comp package is recompiled with latest 7-zip deflate engine it would have more size reduction..even if opting has option to enocde with 7-zip deflate engine as alternative to zlib...

    but still there is deflopt which still can squeeze more..

  24. #54
    Member
    Join Date
    May 2008
    Location
    HK
    Posts
    160
    Thanks
    4
    Thanked 25 Times in 15 Posts
    Quote Originally Posted by encode
    By the way, DeflOpt rocks! I keep all my data in ZIP files, running this tool recursively saves some disk space!
    It sounds like that DeflOpt optimizes the Huffman part only, just like what jpegtran -optimize does.

  25. #55
    Member
    Join Date
    May 2008
    Location
    Kuwait
    Posts
    341
    Thanks
    36
    Thanked 37 Times in 22 Posts
    if all these proggies be joined in one program .. i think it can nock the bell..
    and the advance-comp package added the "mng,jng" support which is new..

  26. #56
    Member
    Join Date
    May 2008
    Location
    HK
    Posts
    160
    Thanks
    4
    Thanked 25 Times in 15 Posts
    Quote Originally Posted by maadjordan
    so i think if advance comp package is recompiled with latest 7-zip deflate engine it would have more size reduction
    People will be happy if you can do this for them. :-]

  27. #57
    Member
    Join Date
    Jul 2008
    Posts
    54
    Thanks
    0
    Thanked 0 Times in 0 Posts
    i tried the optipng_sse2.exe with a 1mb png
    Originally it's:
    1.312.296

    With optipng_sse2.exe:
    1.300.953

    And with pngout it's:
    1.268.870

  28. #58
    Member
    Join Date
    May 2008
    Location
    HK
    Posts
    160
    Thanks
    4
    Thanked 25 Times in 15 Posts
    Quote Originally Posted by John
    i tried the optipng_sse2.exe with a 1mb png
    Originally its:
    1.312.296

    With optipng_sse2.exe:
    1.300.953

    And with pngout its:
    1.268.870
    1MB is not big enough to beat pngout.

  29. #59
    Member
    Join Date
    May 2008
    Location
    Kuwait
    Posts
    341
    Thanks
    36
    Thanked 37 Times in 22 Posts
    have you tested advdef or advpng with -z4 key on original then deflopt the result

  30. #60
    Member
    Join Date
    May 2008
    Location
    HK
    Posts
    160
    Thanks
    4
    Thanked 25 Times in 15 Posts
    Quote Originally Posted by maadjordan
    have you tested advdef or advpng with -z4 key on original then deflopt the result
    http://www.encode.su/forums/index.php?action=vthre ad&forum=1&topic=499&page=1#msg5758

    deflopt usually can draw out some bytes. It shrinks that file in less than 20 bytes IIRC.

Page 2 of 3 FirstFirst 123 LastLast

Similar Threads

  1. gzip - Intel IPP
    By M4ST3R in forum Download Area
    Replies: 5
    Last Post: 2nd June 2010, 15:09
  2. Gzip 1.2.4 hack (OpenWatcom compiles)
    By Rugxulo in forum Data Compression
    Replies: 9
    Last Post: 22nd May 2009, 00:17
  3. Parallel implmentation of gzip: pigz
    By nimdamsk in forum Forum Archive
    Replies: 2
    Last Post: 13th March 2007, 20:44

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •