Results 1 to 11 of 11

Thread: PrePAQ v2 (aka paq8o8pre v2)

  1. #1
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    566
    Thanks
    217
    Thanked 200 Times in 93 Posts
    Hi!

    I just uploaded PrePAQ v2. Images inside PDFs are now wrapped
    inside a BMP header to improve compression and speed.

    Have a look at http://schnaader.info

    Some results (using the old PrePAQ combined with the old/new
    Precomp DLL, but should be almost the same for PrePAQ v2):

    FlashMX.pdf - 4.526.946 bytes
    paq8o8 -3 FlashMX_old.pcf - 2.203.093 bytes, 1487 s
    paq8o8 -4 FlashMX_old.pcf - 1.908.614 bytes, 11063 s

    paq8o8 -3 FlashMX_new.pcf - 1.885.408 bytes, 2452 s
    paq8o8 -4 FlashMX_new.pcf - 1.830.621 bytes, 3098 s

    As you can see, speed is worse for -3, but compression improves,
    for -4 speed is heavily improved while compression only improves
    slightly. Times are taken on my lame 800 MHz AMD CPU, by the
    way, so expect yours to be much better

    If bmpModel would be added to lpaq, lprepaq could also improve
    compression this way. Same for paq9a or some future version that
    could be merged with Precomp.

    Greetings,
    schnaader
    http://schnaader.info
    Damn kids. They're all alike.

  2. #2
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Thanks Christian!

    Mirror: Download

  3. #3
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,375
    Thanks
    214
    Thanked 1,023 Times in 544 Posts
    Tried that image
    Guess it was processed with something like pngout
    Any ideas about some support for these tricky inflate variations?

    Code:
     
    Q:@Y>precomp.exe -v bluevi6.png 
     
    Precomp v0.3.7 - ALPHA version - USE FOR TESTING ONLY 
    Free for non-commercial use - Copyright 2006,2007 by Christian Schneider 
     
    Input file: bluevi6.png 
    Output file: bluevi6.pcf 
     
    Possible zLib-Stream in PNG found at position 37, windowbits = 15 
    Can be decompressed to 455287 bytes 
    No matches 
     
    New size: 209432 instead of 209409 
     
    Done. 
    Time: 297 ms 
     
    Decompressable streams: 1 
    Recompressed streams: 0 
     
    None of the given compression and memory levels could be used. 
    There will be no gain compressing the output file.

  4. #4
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    566
    Thanks
    217
    Thanked 200 Times in 93 Posts
    Quote Originally Posted by Shelwien
    Tried that image
    Guess it was processed with something like pngout
    Any ideas about some support for these tricky inflate variations?
    Future versions (preferably the next one ) of Precomp will
    add a "lossy" mode. This will just decompress these streams
    and recompress them with highest settings, so for this example,
    the PNG image data will stay the same, but the recompressed
    PNG file will not be bit-to-bit identical to the original one.
    A more complicated approach would assist the zLib algorithm
    and save differences in the match algorithm. So every stream
    could be recompressed bit-to-bit identical, but there would
    be additional data for the compressor that would tell him
    when to "switch compression strategy". For GIF files, this
    is easier and I have an almost working implementation, but
    it is a lot of work and lossy mode will be a good compromise
    until this works.
    http://schnaader.info
    Damn kids. They're all alike.

  5. #5
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,375
    Thanks
    214
    Thanked 1,023 Times in 544 Posts
    > Future versions (preferably the next one ) of Precomp will
    > add a "lossy" mode. This will just decompress these streams
    > and recompress them with highest settings, so for this example,
    > the PNG image data will stay the same, but the recompressed
    > PNG file will not be bit-to-bit identical to the original one.

    Imho that's a bad idea.
    That is, lossless recompression would work with any deflate etc
    block in any file. But for lossy mode you'd actually have to
    support the whole file formats - and change some header fields
    to "unpack" a correct file. Which is obviously impossible for
    blocks embedded in executables and such.

    Well, "recompress with highest settings" and padding to the
    initial size would have probably helped in most cases, but
    the trouble is that in fact practically all the special cases
    would have better compression than available with zlib's
    "highest settings". And I don't think that you can afford
    bruteforce match tuning in decoding.

    > A more complicated approach would assist the zLib algorithm
    > and save differences in the match algorithm. So every stream
    > could be recompressed bit-to-bit identical, but there would
    > be additional data for the compressor that would tell him
    > when to "switch compression strategy".

    I'd say drop zlib and parse the deflate blocks yourself.
    It actually might be simpler than using zlib

  6. #6
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    566
    Thanks
    217
    Thanked 200 Times in 93 Posts
    Quote Originally Posted by Shelwien
    Imho thats a bad idea.
    That is, lossless recompression would work with any deflate etc
    block in any file. But for lossy mode youd actually have to
    support the whole file formats - and change some header fields
    to "unpack" a correct file. Which is obviously impossible for
    blocks embedded in executables and such.
    Yes, typical lossy mode applications are single image files or
    ZIP/JAR archives, and recompressing big container files or
    similar data structures will almost certainly lead to a kind of
    data loss.

    Quote Originally Posted by Shelwien
    Id say drop zlib and parse the deflate blocks yourself.
    It actually might be simpler than using zlib
    Two things come to my mind after revisiting RFC 1951:
    1. Trying to control Huffman codes to make sure that when
    recompressing, the same huffman codes are used, perhaps
    this can solve some problems with optimized streams.
    2. If no recompression match is found, just decode the Huffman
    codes. Im not sure if this will help compression much, but
    perhaps something like 5-10% gain could be possible. Has
    someone made experiences with this?
    http://schnaader.info
    Damn kids. They're all alike.

  7. #7
    Member
    Join Date
    Jan 2008
    Location
    Poland
    Posts
    4
    Thanks
    7
    Thanked 0 Times in 0 Posts
    Hi!

    Would it be possible in paq8o8pre to use PackJPG if the image is progressive and paq8o8 otherwise? It would improve compression at low cost. BTW, are there any plans of including progressive jpegs compression to paq8o8?

    Second thing: there's a new PackJPG version - 2.3a, which solves some crash problems.

  8. #8
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    566
    Thanks
    217
    Thanked 200 Times in 93 Posts
    Quote Originally Posted by namq
    Would it be possible in paq8o8pre to use PackJPG if the image is progressive and paq8o8 otherwise? It would improve compression at low cost. BTW, are there any plans of including progressive jpegs compression to paq8o8?
    Nice idea. Could get a bit tricky to make paq8o8 and Precomp
    communicate about who should handle which JPG. I think Ill
    better try to let Precomp detect progressive JPGs and add a
    switch like -onlyprogressivejpgs.

    Second thing: theres a new PackJPG version - 2.3a, which solves some crash problems.
    Paq8o8pre v2 already uses PackJPG 2.3a (its even 2.3b fixing
    some DLL-related bugs ), Precomp will use it with the next
    version v0.3.8.
    http://schnaader.info
    Damn kids. They're all alike.

  9. #9
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,375
    Thanks
    214
    Thanked 1,023 Times in 544 Posts
    > Two things come to my mind after revisiting RFC 1951:
    > 1. Trying to control Huffman codes to make sure that when
    > recompressing, the same huffman codes are used, perhaps
    > this can solve some problems with optimized streams.

    They're static huffman codes, so there's no problem at all.
    You can always rebuild them after decoding the actual data,
    and in a weird case when your predicted codes don't match
    actually used, then you'd have to encode the difference too.

    > 2. If no recompression match is found, just decode the Huffman
    > codes. I'm not sure if this will help compression much, but
    > perhaps something like 5-10% gain could be possible.

    That's true, deflate matches can be compressed better too.
    But no sense to do it like that, as matches are derived data anyway.
    So again, it should be done by running a real LZ encoder
    (both in decoding and encoding stages!) which would allow
    to see the available options, thus you would be able to
    encode just a difference between your estimation and real match -
    and most of the time there won't be much choice.
    Like that you'll be able to predict matches using zlibs encoding
    strategies, but not be limited by that. If you fail to predict
    tbe exact value with your model - you just encode the difference,
    that's pretty normal for compression
    So, though its technically possible to support bruteforce-optimized
    match sequences, its probably better to just encode them based on
    some less-optimal strategy, which will be redundant, but you won't
    need to wait hours for decoding

    > Has someone made experiences with this?

    Well, guess I'd written something pretty similar once

  10. #10
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    566
    Thanks
    217
    Thanked 200 Times in 93 Posts
    Quote Originally Posted by Shelwien
    Theyre static huffman codes, so theres no problem at all.
    You can always rebuild them after decoding the actual data,
    and in a weird case when your predicted codes dont match
    actually used, then youd have to encode the difference too.
    The "weird case" doesnt seem that weird. For example, the deflate
    stream in the image you posted above starts with:

    8C 7C

    Having a look at the lowest 3 bits of 8C gives "100". This means
    (RFC 1951, Section 3.2.3) BFINAL = 0 and BTYPE = 10, so
    the stream is compressed with dynamic huffman codes. If I check
    all the 81 streams I get at recompression, every one of them has
    those same 3 bits, but not even one of them starts with "8C", so
    the huffman codes are different...

    But you are right, if I start parsing the huffman codes, its not a
    big step to parse the whole stream, so Ill follow this way.
    http://schnaader.info
    Damn kids. They're all alike.

  11. #11
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,375
    Thanks
    214
    Thanked 1,023 Times in 544 Posts
    >> They're static huffman codes, so there's no problem at all.
    >> You can always rebuild them after decoding the actual data,
    >> and in a weird case when your predicted codes don't match
    >> actually used, then you'd have to encode the difference too.
    >
    > The "weird case" doesn't seem that weird.

    By "weird case" I meant a case where huffman table is not optimal
    for given data. Also by "static" I meant non-adaptive.

    > Having a look at the lowest 3 bits of 8C gives "100". This means
    > (RFC 1951, Section 3.2.3) BFINAL = 0 and BTYPE = 10, so
    > the stream is compressed with dynamic huffman codes. If I check
    > all the 81 streams I get at recompression, every one of them has
    > those same 3 bits, but not even one of them starts with "8C", so
    > the huffman codes are different...

    That's pretty obvious as pngout which apparently produced that file
    doesn't use zlib.

    > But you are right, if I start parsing the huffman codes, it's not a
    > big step to parse the whole stream, so I'll follow this way.

    You only need to follow the generalized encoding algorithm.
    Which takes a block, turns it into a sequence of matches,
    calculates the huffman table (supposedly optimal, which greatly
    narrows the possibilities) and writes the code.
    So there's a sequence of decisions taken by encoder while parsing
    the given data... and there's a fairly limited set of possible
    decisions on each step because its impossible to insert a
    completely random match etc. And these sets can be even further
    limited based on supposed encoding strategy. Or, better said,
    you can use the strategy-based conditions as contexts in a model
    of your compressor for extra information contained in the deflate
    stream, given the data encoded in the stream.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •