Reduce Filesize of PDF-embedded Bitmap Images with Ghostscript
Ghostscript is a powerful tool for manipulating PDF and PS files. But with great power comes great complexity. Here are examples on embedding fonts and reducing image size with it!
Embedding Fonts
Usually, your PDF typesetting program takes care of embedding fonts into a PDF document (PDFLaTeX does); but sometimes you have strange sources of PDFs: My ROOT-generated plots for example do not embed their fonts1.
In a blog post, Karl Rupp summarizes how to embed fonts into PDFs from different sources. To really embed ALL the fonts, also those usually ignored by Ghostscript, you have to dive in even deeper. Here is the command, which I found in a Stackoverflow reply:
gs -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dCompressFonts=true -dSubsetFonts=true -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=output.pdf -c ".setpdfwrite <</NeverEmbed [ ]>> setdistillerparams" -f input.pdf
A quicker alternative to Ghostscript is the pdftocairo
command of the poppler PDF library. The command enables conversion to different vector graphics formats2. But it can also convert from PDF to PDF, embedding the fonts in the process.
pdftocairo input.pdf -pdf output.pdf
Changing Image Quality
For printing a document, you probably want to have it available in the best quality possible. For uploading it somewhere for sharing with your friends, file size might be more important than quality. Usually, in best vector fashion, the bulk of bits of a LaTeX-set document are taken by bitmap images (or bitmap-like raster images like JPG, PNG, …). Ghostscript offers a batch way to reduce the size of all embedded bitmap-like images.
Everything revolves around the -dPDFSETTINGS=/
setting. It can take different values, e.g. screen
from the command above (equivalent to 72 dpi images) to prepress
(300 dpi). A one-liner to get all images of a document down to 150 dpi would be
gs -sDEVICE=pdfwrite -dCompabilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf
Since I’m lazy and don’t want to memorize this, I made a small, encapsulating shell script a while ago to reduce the PDF’s size by means of image compression: reducePdfSize.sh
.
Using pdfimages -all
on my thesis, which is in total 41 MB of size, results in extraction of about 21 MB images – half of the data in the PDF of my thesis is for bitmap images. Using the above Ghostscript command on thesis.pdf
reduces the 41 MB to 15 MB, using the printer option3.
Not bad, right?