Results 1 to 11 of 11

Thread: lossless PDF compressor

  1. #1
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    98
    Thanks
    36
    Thanked 8 Times in 8 Posts

    lossless PDF compressor

    Just curious: are there any special lossless compressors for pdf files? I am not talking about existing PDF optimizers that remove fonts, meta data with output still being pdf format. deflate sometimes works well, sometimes doesn't. Not an expert in pdf format, I figure it must have some structures that can be used for more compression than general dictionary based algorithms, especially for pdf files with many tables and forms, generated by office tools. Any ideas or pointers?

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,972
    Thanks
    296
    Thanked 1,299 Times in 736 Posts
    There's precomp: https://github.com/schnaader/precomp-cpp/releases
    PDFs can contain deflate streams, jpegs (maybe png/gif too), base64 - these are all handled by precomp.

  3. Thanks:

    smjohn1 (15th January 2020)

  4. #3
    Member CompressMaster's Avatar
    Join Date
    Jun 2018
    Location
    Lovinobana, Slovakia
    Posts
    198
    Thanks
    58
    Thanked 15 Times in 15 Posts
    Quote Originally Posted by Shelwien View Post
    There's precomp: https://github.com/schnaader/precomp-cpp/releases
    PDFs can contain deflate streams, jpegs (maybe png/gif too), base64 - these are all handled by precomp.
    Kinda off-topic, but it was the first software for compression of already compressed data (with decompression) I ever found when I started finding compression tools for videos and images around 2012. Six years later, I created account here.

    Or maybe if you have original document and PDF is the result of export command, you can apply compression for text (approx. 80% ratio of text) and other data separately.

  5. #4
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    875
    Thanks
    242
    Thanked 324 Times in 197 Posts

  6. #5
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    98
    Thanks
    36
    Thanked 8 Times in 8 Posts
    Quote Originally Posted by Shelwien View Post
    There's precomp: https://github.com/schnaader/precomp-cpp/releases
    PDFs can contain deflate streams, jpegs (maybe png/gif too), base64 - these are all handled by precomp.
    thanks, I will play with it and see how it works on PDF files.

  7. #6
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    98
    Thanks
    36
    Thanked 8 Times in 8 Posts
    Quote Originally Posted by CompressMaster View Post
    Kinda off-topic, but it was the first software for compression of already compressed data (with decompression) I ever found when I started finding compression tools for videos and images around 2012. Six years later, I created account here.

    Or maybe if you have original document and PDF is the result of export command, you can apply compression for text (approx. 80% ratio of text) and other data separately.
    Thanks. I think a few major pdf creators already have this type of compression feature built-in. What I talked about is to "recompress" an existing pdf file.

  8. #7
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    98
    Thanks
    36
    Thanked 8 Times in 8 Posts
    Quote Originally Posted by Jyrki Alakuijala View Post
    Thanks, I did find this package too. But perhaps I didn't make it clear, my input is a PDF file, and after (re)compression, the output doesn't need to be pdf, but it should be able to decompressed back to the exact original pdf file, bitwise. I guess this is NOT what this tool does.

  9. #8
    Member
    Join Date
    Jul 2014
    Location
    Mars
    Posts
    198
    Thanks
    135
    Thanked 13 Times in 12 Posts
    Yeah there are issues ,you can contribute there, for example - make new win build for community)

  10. #9
    Member
    Join Date
    May 2008
    Location
    Kuwait
    Posts
    354
    Thanks
    37
    Thanked 38 Times in 23 Posts
    precomp is your best option as it can recognize most of the streams and if you have more time you can optimize the pdf file prior to that which can find duplicate stream and remove other not required/visible streams.

    next to precomp you have Stuffit which can be unpacked on multiple systems including windows/mac/ios

    other use Mfilter with 7-zip as it works best on scanned pdf files with JPG streams.

  11. #10
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    615
    Thanks
    260
    Thanked 242 Times in 121 Posts
    Precomp is the tool of choice for losslessness and speed - if speed and memory usage doesn't matter, paq8px and cmix give smaller results.

    For the sake of completeness, here's a list of things that Precomp doesn't handle (yet) and if they are planned for upcoming versions.

    - General image compression will improve with the usage of FLIF (and/or webp/pik), see issue #44, planned for >0.4.8
    - Images with predictors aren't handled yet, planned for >0.4.8
    - CCITT/JBIG/JPG2000 aren't handled and it's unlikely they will be, especially JPEG2000 is hell of a complex format
    - "Encrypted" PDFs (using the known Adobe key) aren't handled yet, planned for >= 0.4.8
    - ASCII-/LZW-/Base85 encoded streams, see issue #3, planned for >= 0.4.8
    - Postscript Type1 fonts are snake oil encrypted and can be compressed much better after decryption, planned for 0.4.8
    - Most of the xref tables can be generated from PDF content, planned for 0.4.8
    - JPEG compression and decompression will be much faster and multi-threaded in 0.4.8 by using brunsli, already implemented in the development version
    - Multi-threaded reconstruction of deflate streams will also be implemented in 0.4.8
    - Features for extracting and analyzing streams will be implemented in 0.4.8. This will help in two ways: First, it can save decompressed streams to disk (e.g. PDF images as BMP/PNG/JPG) so you can see which images are the biggest in size and resolution. Second, it will also be available in a reversable variant that can be used to compress the extracted content with other tools, e.g. FLIF/webp for images
    Last edited by schnaader; 15th January 2020 at 11:30. Reason: added some links
    http://schnaader.info
    Damn kids. They're all alike.

  12. Thanks (3):

    Mike (15th January 2020),Shelwien (15th January 2020),smjohn1 (15th January 2020)

  13. #11
    Member
    Join Date
    May 2008
    Location
    Kuwait
    Posts
    354
    Thanks
    37
    Thanked 38 Times in 23 Posts
    can you add also making "Precomp" as a filter for 7-zip (like mfilter did) ..

Similar Threads

  1. are there any lossless pdf optimizers, trimmers?
    By necros in forum Data Compression
    Replies: 3
    Last Post: 14th August 2014, 05:53
  2. PDF optimizing
    By SvenBent in forum Data Compression
    Replies: 8
    Last Post: 16th January 2014, 12:37
  3. BIM (a new lossless image compressor) is here!
    By encode in forum Data Compression
    Replies: 43
    Last Post: 17th September 2013, 15:00
  4. New lossless image compressor
    By encode in forum Data Compression
    Replies: 105
    Last Post: 10th January 2013, 09:36
  5. GraLIC - new lossless image compressor
    By Alexander Rhatushnyak in forum Data Compression
    Replies: 17
    Last Post: 29th November 2010, 20:27

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •