Results 1 to 19 of 19

Thread: Ben Jos Walbeehm's DeflOpt: what does it actually do?

  1. #1
    Member caveman's Avatar
    Join Date
    Jul 2009
    Location
    Strasbourg, France
    Posts
    190
    Thanks
    8
    Thanked 62 Times in 33 Posts

    Ben Jos Walbeehm's DeflOpt: what does it actually do?

    Hello,
    first a link to this DeflOpt tool.

    As the author wrote:
    "Contrary to what I have seen mentioned on some pages, DeflOpt does not just remove garbage from deflated files, but actually optimises the structures created by deflate programs and then recodes the files using those optimised structures. It performs a few other tricks too that are not "just removing garbage". It will make a file smaller more than 95% of the time, even if that file was created by a deflate program that does not generate "garbage", and even if it was created by Ken Silverman's kzip or pngout programs.

    DeflOpt should be used as the final step in getting files smaller. First use other programs (I recommend kzip for ZIP files and pngout for PNG files), then run DeflOpt on the resulting files."

    Any idea about what kind of structures optimisations he is talking about?

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,423
    Thanks
    223
    Thanked 1,052 Times in 565 Posts
    Code:
      Block      :           335
      Blocktype  :             1  (Fixed trees)
      Offset     :    12,555,965
      Size (unc) :         9,251 bytes
      Size (unc) :        74,008 bits
      Bits (com) :         4,051  (18.27 bits/bit)
      Bits stored:        74,047
    
      Block      :           336
      Blocktype  :             2  (Dynamic trees; L-codes: 286; D-codes: 30)
      Offset     :    12,565,216
      Size (unc) :        35,645 bytes
      Size (unc) :       285,160 bits
      Bits (com) :         5,703  (50.00 bits/bit)
        18 C-codes (54 bits); 690 bits for trees; 761 bits total (95.125 bytes)
      Bits (new) :         5,703
        18 C-codes (54 bits); 690 bits for trees; 761 bits total (95.125 bytes)
      Bits fixed :         6,563
      Bits stored:       285,196
      Bits (redo):         5,703
        18 C-codes (54 bits); 690 bits for trees; 761 bits total (95.125 bytes)
    In verbose mode it prints stuff like that.
    So imho it tries to optimize the compression of deflate block headers.

  3. #3
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    878
    Thanks
    51
    Thanked 106 Times in 84 Posts
    Would be nice if it was implanted into pngout maybe with a switch to active it like /DO

    that way you could still make small improvements with pngout /r trials that would be masked by deflopt.


    .e.g

    pngout first try 21354 bytes
    first try wigt deflop 21.302 (saved52 bytes)

    pngt after several random trials = 21348
    but this is bigger then the input file deflopt files so it is not saved but pngout and you "miss" it

    otherwise you need to save the original and do all trield with deflopt efter pngout and compare later.

    better just to implant into pngout then. much easier
    Last edited by SvenBent; 11th September 2009 at 11:44.

  4. #4
    Member caveman's Avatar
    Join Date
    Jul 2009
    Location
    Strasbourg, France
    Posts
    190
    Thanks
    8
    Thanked 62 Times in 33 Posts
    Quote Originally Posted by SvenBent View Post
    Would be nice if it was implanted into pngout maybe with a switch to active it like /DO
    This would be easy if pngout and deflopt were open source projects... I tried to get in touch with Ben Jos Walbeehm twice but got no answer (I'd like to port DeflOpt to Darwin/Mac OS X).

    Ken Silverman could perhaps integrate DeflOpt if we asked him to and we had access to DeflOpt sources.

    Since Internet Explorer 6 market share (and its buggy PNG support) is slowly fading a way there's going to be a renewed interest for tools able to optimise .png image files.

  5. #5
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    878
    Thanks
    51
    Thanked 106 Times in 84 Posts
    I dont think IE 6 has such a buggy PNG support that it would matter
    as i recall its only the gamma features the doesn't work ( i might be wrong)

    however i really would love if peopel new there is other "web" formats than jpeg. it really hurts my eyes to se people making grahs in jpgs. og black/white clipart.

  6. #6
    Member jo.henke's Avatar
    Join Date
    Dec 2008
    Location
    Europe
    Posts
    56
    Thanks
    0
    Thanked 4 Times in 1 Post

    defluff 0.1.0

    Hello,

    I'm also not happy with the situation of deflopt (especially that there is no publicly available Linux binary). So I decided to do some research and started a similar project - "defluff".

    I already implemented some of my ideas. But while there is still much more to do, I now reached a point, where it could already be useful for others. That's why I decided to publish some first results in this forum:

    Code:
    ~$ gzip -9 <enwik8 >enwik8-gzip9.gz
    36445241 enwik8-gzip9.gz
    36444728 enwik8-gzip9-defluff.gz
    36444655 enwik8-gzip9-deflopt.gz
    36444597 enwik8-gzip9-deflopt-defluff.gz
    36444596 enwik8-gzip9-defluff-deflopt.gz
    
    ~$ 7z a -tgzip -mfb=258 -mpass=16 -si enwik8-7z258.gz <enwik8
    35100598 enwik8-7z258.gz
    35075076 enwik8-7z258-deflopt.gz
    35074920 enwik8-7z258-defluff.gz
    35074624 enwik8-7z258-deflopt-defluff.gz
    35074617 enwik8-7z258-defluff-deflopt.gz
    
    ~$ 7z a -tgzip -mfb=245 -mpass=16 -si enwik8-7z245.gz <enwik8
    35100410 enwik8-7z245.gz
    35074903 enwik8-7z245-deflopt.gz
    35074738 enwik8-7z245-defluff.gz
    35074441 enwik8-7z245-deflopt-defluff.gz
    35074436 enwik8-7z245-defluff-deflopt.gz
    
    (generated with 'kzip -n247 -r' and converted to gzip format)
    34975480 enwik8-kzip247.gz
    34974303 enwik8-kzip247-deflopt.gz
    34974083 enwik8-kzip247-defluff.gz
    34974051 enwik8-kzip247-defluff-deflopt.gz
    34974038 enwik8-kzip247-deflopt-defluff.gz
    I hereby claim that the last file contains the smallest deflate representation of enwik8 ever created

    Defluff doesn't provide all the algorithms of deflopt, yet. But it uses some other strong ones. Therefore, the results heavily depend on the kind of compressed data, and running both programs on a file usually gives the best results.

    For those that are interested, I attached a x86 Linux binary. It acts like a filter, reading from stdin and writing to stdout. Currently it takes no arguments and only supports "simple" gzip files (those without an embedded file name or other extensions).

    Update: Have a look at defluff's dedicated thread for newer versions.
    Last edited by jo.henke; 22nd February 2011 at 23:56.

  7. #7
    Member
    Join Date
    May 2008
    Location
    England
    Posts
    325
    Thanks
    18
    Thanked 6 Times in 5 Posts
    Nice, any chance of a windows binary?

  8. #8
    Member jo.henke's Avatar
    Join Date
    Dec 2008
    Location
    Europe
    Posts
    56
    Thanks
    0
    Thanked 4 Times in 1 Post
    To be really useful for a Windows user, defluff probably needs to support zip files first. I think this should be my next target.

  9. #9
    Tester
    Black_Fox's Avatar
    Join Date
    May 2008
    Location
    [CZE] Czechia
    Posts
    471
    Thanks
    26
    Thanked 9 Times in 8 Posts
    Results look very good! Could this be used for recompression of PNG images in future? deflopt can do that
    I am... Black_Fox... my discontinued benchmark
    "No one involved in computers would ever say that a certain amount of memory is enough for all time? I keep bumping into that silly quotation attributed to me that says 640K of memory is enough. There's never a citation; the quotation just floats like a rumor, repeated again and again." -- Bill Gates

  10. #10
    Member jo.henke's Avatar
    Join Date
    Dec 2008
    Location
    Europe
    Posts
    56
    Thanks
    0
    Thanked 4 Times in 1 Post
    Yes, defluff could theoretically compact any file that contains deflate streams. I would "just" have to add a handler for this file format - and PNG support is on my own wish list But please don't expect anything to soon.

  11. #11
    Member
    Join Date
    May 2008
    Location
    Kuwait
    Posts
    335
    Thanks
    36
    Thanked 36 Times in 21 Posts
    thanks for your program .. waiting or win version

  12. #12
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,423
    Thanks
    223
    Thanked 1,052 Times in 565 Posts
    I just noticed that deflopt not only optimizes the headers, but also re-splits the LZ token stream into blocks,
    Code:
    book1.gzip
    type=2 h.size=   71 c.size=57764 d.size=32767
    type=2 h.size=   74 c.size=59238 d.size=32767
    type=2 h.size=   74 c.size=59491 d.size=32767
    type=2 h.size=   69 c.size=59879 d.size=32767
    type=2 h.size=   69 c.size=59918 d.size=32767
    type=2 h.size=   67 c.size=15543 d.size= 8504
    
    book1.gzip.deflopt
    type=2 h.size=   71 c.size=57760 d.size=32809
    type=2 h.size=   74 c.size=59234 d.size=32817
    type=2 h.size=   74 c.size=59467 d.size=33097
    type=2 h.size=   70 c.size=59855 d.size=33099
    type=2 h.size=   70 c.size=59891 d.size=33125
    type=2 h.size=   67 c.size=15536 d.size= 8592
    and even modifies the actual tokens (though diff is pretty small)

    Code:
    raw2dec book1__izip.deflopt.raw 1_dec
    raw2dec book1__izip.raw         2_dec
    dec2unpdif.exe 1_dec 1_unp 1_dif 9
    dec2unpdif.exe 2_dec 2_unp 2_dif 9
    
    1_dec 694156
    1_unp 768771
    1_dif   3688
    
    2_dec 689356
    2_unp 768771
    2_dif      4
    So its actually surprising how the gain from deflpopt is so tiny :)

  13. #13
    Member caveman's Avatar
    Join Date
    Jul 2009
    Location
    Strasbourg, France
    Posts
    190
    Thanks
    8
    Thanked 62 Times in 33 Posts
    Quote Originally Posted by Shelwien View Post
    I just noticed that deflopt not only optimizes the headers, but also re-splits the LZ token stream into blocks,
    Code:
    book1.gzip
    type=2 h.size=   71 c.size=57764 d.size=32767
    type=2 h.size=   74 c.size=59238 d.size=32767
    type=2 h.size=   74 c.size=59491 d.size=32767
    type=2 h.size=   69 c.size=59879 d.size=32767
    type=2 h.size=   69 c.size=59918 d.size=32767
    type=2 h.size=   67 c.size=15543 d.size= 8504
    
    book1.gzip.deflopt
    type=2 h.size=   71 c.size=57760 d.size=32809
    type=2 h.size=   74 c.size=59234 d.size=32817
    type=2 h.size=   74 c.size=59467 d.size=33097
    type=2 h.size=   70 c.size=59855 d.size=33099
    type=2 h.size=   70 c.size=59891 d.size=33125
    type=2 h.size=   67 c.size=15536 d.size= 8592
    What did you use to produce the gzip file?
    I can't reproduce this behaviour using gzip -9 and deflopt -b:
    Code:
    book1.gz
    T Boundary     Size
    2        0   462675
    2    22052   474497
    2    45701   476518
    2    69430   479584
    2    8d8d7   479899
    2    b234b   124881
    2498054 bits long, composed of 6 blocks
    
    book1.gz.deflopt
    T Boundary     Size
    2        0   462646
    2    22052   474463
    2    45701   476330
    2    69430   479400
    2    8d8d7   479687
    2    b234b   124826
    2497352 bits long, composed of 6 blocks
    
    book1.gz.defluff
    T Boundary     Size
    2        0   462652
    2    22052   474472
    2    45701   476330
    2    69430   479400
    2    8d8d7   479687
    2    b234b   124826
    2497367 bits long, composed of 6 blocks
    I knew that deflopt could fuse blocks, for instance this PNG:
    Code:
    temp-n6.png
    T Boundary     Size
    2        0     5282
    2      e5e     8505
    2     249f     5768
    2     3820    12032
    2     4b41     2730
    2     4e6f     1749
    36066 bits long, composed of 6 blocks
    
    temp-n6.png.deflopt
    T Boundary     Size
    2        0     5255
    2      e5e     8391
    2     249f     5679
    2     3820    11985
    1     4b41     4074
    35384 bits long, composed of 5 blocks
    
    temp-n6.png.defluff
    T Boundary     Size
    2        0     5243
    2      e5e     8391
    2     249f     5683
    2     3820    11984
    1     4b41     2584
    1     4e6f     1500
    35385 bits long, composed of 6 blocks
    
    temp-n6.png.huffmix (cherry picked the smallest blocks produced by deflopt & defluff)
    T Boundary     Size
    2        0     5243
    2      e5e     8391
    2     249f     5679
    2     3820    11984
    1     4b41     4074
    35371 bits long, composed of 5 blocks
    Where "Boundary" (start of a new block) is located in the uncompressed data (bytes in hexadecimal) and "Size" is the compressed size (bits in decimal) of this Deflate block.
    What are h.size, c.size and d.size?
    Last edited by caveman; 21st October 2011 at 03:38. Reason: Added defluff results

  14. #14
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,423
    Thanks
    223
    Thanked 1,052 Times in 565 Posts
    > What did you use to produce the gzip file?

    gzip -9 and deflopt 2.07

    > I knew that deflopt could fuse blocks, for instance this PNG

    It seems like (maybe partial) reparsing of match tokens, not at the level of "fusing"

    > What are h.size, c.size and d.size?

    h.size is a size of compressed block header (rounded to a byte),
    c.size is the size of compressed block data,
    d.size is the number of tokens in the block (not counting EOF/end-of-block code)

  15. #15
    Member caveman's Avatar
    Join Date
    Jul 2009
    Location
    Strasbourg, France
    Posts
    190
    Thanks
    8
    Thanked 62 Times in 33 Posts
    Quote Originally Posted by Shelwien View Post
    It seems like (maybe partial) reparsing of match tokens, not at the level of "fusing"
    Here is the kind of transform I've found:
    2078c2078,2080
    < [18] (3,3220)
    ---
    > [6] 61
    > [5] 0A
    > [6] 72

    A match of length 3 that takes 18 bits is replaced by 3 literals (6+5+6 = 17 bits)

    65607c65699,65701
    < [19] (3,3270)
    ---
    > [6] 69
    > [6] 73
    > [6] 73

    A match of length 3 that takes 19 bits is replaced by 3 literals (6+6+6 = 18 bits).

    You can check it yourself the enclosed "book1.diff.txt" file is easy to read: the number inside the square brackets is the size in bits in the compressed stream, followed by a literal or a match pair (length, distance).

    Quote Originally Posted by Shelwien View Post
    h.size is a size of compressed block header (rounded to a byte),
    c.size is the size of compressed block data,
    d.size is the number of tokens in the block (not counting EOF/end-of-block code)
    Ok, I'll update defdb to also report the header size and token count.
    Attached Files Attached Files
    Last edited by caveman; 21st October 2011 at 17:31.

  16. #16
    Member caveman's Avatar
    Join Date
    Jul 2009
    Location
    Strasbourg, France
    Posts
    190
    Thanks
    8
    Thanked 62 Times in 33 Posts
    I have found a nasty bug in DeflOpt.
    If the size of the optimized deflate stream is more than 2^32 bits (not bytes) the output file is largely truncated.
    For instance book1 concatenated 1721 times produces in a 1,323,054,891 bytes long file (1.3 GB).
    Applying gzip -6 on this file results in a 537,200,678 bytes long file (537 MB) its deflate stream is 4,297,605,230 bits long (that's slightly more than 2^32 but actually not a problem).
    Trouble arise when DeflOpt reports this:
    Number of bytes saved: 537,007,610 (537,200,678 --> 193,06 (1,093,587 bits)

    The output file is effectively only 193,068 bytes long... and totally screwed.

    1720 times book1 is still perfectly optimized (4,295,108,225 bits before 4,294,015,192 bits after deflopt).

  17. #17
    Member nikkho's Avatar
    Join Date
    Jul 2011
    Location
    Spain
    Posts
    546
    Thanks
    219
    Thanked 164 Times in 105 Posts
    Just noticed original Ben's page at http://www.walbeehm.com is now lost.
    Do you know if he moved somewhere?

  18. #18
    Member caveman's Avatar
    Join Date
    Jul 2009
    Location
    Strasbourg, France
    Posts
    190
    Thanks
    8
    Thanked 62 Times in 33 Posts
    Quote Originally Posted by nikkho View Post
    Just noticed original Ben's page at http://www.walbeehm.com is now lost.
    Do you know if he moved somewhere?
    The page was still there about 3 weeks ago, he wrote something about DeflOpt in 2012 (he wanted to work on it again), don't call him "Ben" but rather "Ben Jos".

  19. #19
    Member
    Join Date
    May 2008
    Location
    HK
    Posts
    160
    Thanks
    4
    Thanked 25 Times in 15 Posts
    walbeehm.com expired on 04/26/2014 and is pending renewal or deletion.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •