Results 1 to 11 of 11

Thread: lossless PDF compressor

  1. #1
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    86
    Thanks
    33
    Thanked 8 Times in 8 Posts

    lossless PDF compressor

    Just curious: are there any special lossless compressors for pdf files? I am not talking about existing PDF optimizers that remove fonts, meta data with output still being pdf format. deflate sometimes works well, sometimes doesn't. Not an expert in pdf format, I figure it must have some structures that can be used for more compression than general dictionary based algorithms, especially for pdf files with many tables and forms, generated by office tools. Any ideas or pointers?

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,588
    Thanks
    251
    Thanked 1,134 Times in 623 Posts
    There's precomp: https://github.com/schnaader/precomp-cpp/releases
    PDFs can contain deflate streams, jpegs (maybe png/gif too), base64 - these are all handled by precomp.

  3. Thanks:

    smjohn1 (15th January 2020)

  4. #3
    Member
    Join Date
    Jun 2018
    Location
    Slovakia
    Posts
    164
    Thanks
    45
    Thanked 11 Times in 11 Posts
    Quote Originally Posted by Shelwien View Post
    There's precomp: https://github.com/schnaader/precomp-cpp/releases
    PDFs can contain deflate streams, jpegs (maybe png/gif too), base64 - these are all handled by precomp.
    Kinda off-topic, but it was the first software for compression of already compressed data (with decompression) I ever found when I started finding compression tools for videos and images around 2012. Six years later, I created account here.

    Or maybe if you have original document and PDF is the result of export command, you can apply compression for text (approx. 80% ratio of text) and other data separately.

  5. #4
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    729
    Thanks
    212
    Thanked 270 Times in 160 Posts

  6. #5
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    86
    Thanks
    33
    Thanked 8 Times in 8 Posts
    Quote Originally Posted by Shelwien View Post
    There's precomp: https://github.com/schnaader/precomp-cpp/releases
    PDFs can contain deflate streams, jpegs (maybe png/gif too), base64 - these are all handled by precomp.
    thanks, I will play with it and see how it works on PDF files.

  7. #6
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    86
    Thanks
    33
    Thanked 8 Times in 8 Posts
    Quote Originally Posted by CompressMaster View Post
    Kinda off-topic, but it was the first software for compression of already compressed data (with decompression) I ever found when I started finding compression tools for videos and images around 2012. Six years later, I created account here.

    Or maybe if you have original document and PDF is the result of export command, you can apply compression for text (approx. 80% ratio of text) and other data separately.
    Thanks. I think a few major pdf creators already have this type of compression feature built-in. What I talked about is to "recompress" an existing pdf file.

  8. #7
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    86
    Thanks
    33
    Thanked 8 Times in 8 Posts
    Quote Originally Posted by Jyrki Alakuijala View Post
    Thanks, I did find this package too. But perhaps I didn't make it clear, my input is a PDF file, and after (re)compression, the output doesn't need to be pdf, but it should be able to decompressed back to the exact original pdf file, bitwise. I guess this is NOT what this tool does.

  9. #8
    Member
    Join Date
    Jul 2014
    Location
    Mars
    Posts
    188
    Thanks
    132
    Thanked 12 Times in 11 Posts
    Yeah there are issues ,you can contribute there, for example - make new win build for community)

  10. #9
    Member
    Join Date
    May 2008
    Location
    Kuwait
    Posts
    338
    Thanks
    36
    Thanked 36 Times in 21 Posts
    precomp is your best option as it can recognize most of the streams and if you have more time you can optimize the pdf file prior to that which can find duplicate stream and remove other not required/visible streams.

    next to precomp you have Stuffit which can be unpacked on multiple systems including windows/mac/ios

    other use Mfilter with 7-zip as it works best on scanned pdf files with JPG streams.

  11. #10
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    575
    Thanks
    220
    Thanked 215 Times in 101 Posts
    Precomp is the tool of choice for losslessness and speed - if speed and memory usage doesn't matter, paq8px and cmix give smaller results.

    For the sake of completeness, here's a list of things that Precomp doesn't handle (yet) and if they are planned for upcoming versions.

    - General image compression will improve with the usage of FLIF (and/or webp/pik), see issue #44, planned for >0.4.8
    - Images with predictors aren't handled yet, planned for >0.4.8
    - CCITT/JBIG/JPG2000 aren't handled and it's unlikely they will be, especially JPEG2000 is hell of a complex format
    - "Encrypted" PDFs (using the known Adobe key) aren't handled yet, planned for >= 0.4.8
    - ASCII-/LZW-/Base85 encoded streams, see issue #3, planned for >= 0.4.8
    - Postscript Type1 fonts are snake oil encrypted and can be compressed much better after decryption, planned for 0.4.8
    - Most of the xref tables can be generated from PDF content, planned for 0.4.8
    - JPEG compression and decompression will be much faster and multi-threaded in 0.4.8 by using brunsli, already implemented in the development version
    - Multi-threaded reconstruction of deflate streams will also be implemented in 0.4.8
    - Features for extracting and analyzing streams will be implemented in 0.4.8. This will help in two ways: First, it can save decompressed streams to disk (e.g. PDF images as BMP/PNG/JPG) so you can see which images are the biggest in size and resolution. Second, it will also be available in a reversable variant that can be used to compress the extracted content with other tools, e.g. FLIF/webp for images
    Last edited by schnaader; 15th January 2020 at 12:30. Reason: added some links
    http://schnaader.info
    Damn kids. They're all alike.

  12. Thanks (3):

    Mike (15th January 2020),Shelwien (15th January 2020),smjohn1 (15th January 2020)

  13. #11
    Member
    Join Date
    May 2008
    Location
    Kuwait
    Posts
    338
    Thanks
    36
    Thanked 36 Times in 21 Posts
    can you add also making "Precomp" as a filter for 7-zip (like mfilter did) ..

Similar Threads

  1. are there any lossless pdf optimizers, trimmers?
    By necros in forum Data Compression
    Replies: 3
    Last Post: 14th August 2014, 06:53
  2. PDF optimizing
    By SvenBent in forum Data Compression
    Replies: 8
    Last Post: 16th January 2014, 13:37
  3. BIM (a new lossless image compressor) is here!
    By encode in forum Data Compression
    Replies: 43
    Last Post: 17th September 2013, 16:00
  4. New lossless image compressor
    By encode in forum Data Compression
    Replies: 105
    Last Post: 10th January 2013, 10:36
  5. GraLIC - new lossless image compressor
    By Alexander Rhatushnyak in forum Data Compression
    Replies: 17
    Last Post: 29th November 2010, 21:27

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •