Optimize PDF files - Part I

Large PDF files are slow to download and they consume too much bandwidth. Create smaller PDF files by following a few easy rules. You can possibly decrease the size of PDF files by several hundreds of kilobytes. This article also goes into the details of using the Save As PDF command in Word 2007 and 2010.

PDF files are often unnecessarily large. It can take hundreds of kilobytes or even megabytes to represent a small amount of actual content. Downloads run at a snail's pace. Web servers are running hot. Bandwidth costs are exceeded and web sites become slow to respond.

What's causing the PDF bloat? The bulk of a regular PDF file consists primarily of text, images and fonts.

Regular PDF

text

images

fonts

Obviously, you could optimize by writing less text! This isn't what we're after. Effective optimization means cutting down the font and image parts. It is possible to cut off the font part and decrease the image part while keeping the useful textual data intact (or almost intact). This article shows you how.

Optimized PDF

text

images

There are several ways to produce PDF files. One can use a PDF printer driver, for example. This article focuses on the Save As PDF command in Microsoft Word 2007 and 2010. Many of the tricks are also applicable to other PDF writers. Because PDF writers differ in the details, you need to experiment to find out how the rules work with your PDF writer.

Saving as PDF is a built-in feature in Word 2010. To enable it with Word 2007, you may need a free add-in from Microsoft. The add-in is titled 2007 Microsoft Office Add-in: Microsoft Save as PDF or XPS. You can download it on Microsoft's web site.

Font optimization

Font issues are crucial to PDF optimization. A simple PDF may easily store like 200 kB of font data. It is possible to go without storing any font data at all. By designing your font use in advance you get stylish and smaller files.

Rule #1: Use standard fonts

PDF comes with 5 standard font families. The families are Times, Helvetica, Courier, Symbol and ZapfDingbats. All PDF readers support these standard fonts.

For all other fonts, PDF writers normally embed the font data in the PDF file. Embedding means copying. The file includes a copy of the entire font, or a part of it. When a Garamond font is used, for example, the font glyphs get copied in the PDF. This consumes a lot of space.

To tell which fonts exist in a PDF file, select Properties in the File menu of Adobe Reader. Open the Fonts tab. Here you see the fonts used in the currently open PDF. Fonts marked as (Embedded) or (Embedded Subset) have been embedded in the file. Other fonts were not embedded. As a rule of thumb, the 5 standard fonts are not usually embedded, while all others are. One can, however, embed standard fonts, or not embed the other fonts. This depends on the capabilities of the PDF writer application. With Word 2007/2010, you have the option to embed all fonts, or embed everything else but 2 standard fonts. We will go into the details soon.

To save space, use the PDF standard fonts. As it happens, they are not installed on Windows (other than Symbol). Fortunately, similar fonts do exist and PDF writer applications are aware of the similarities. You can use Times New Roman in place of Times and Arial in place of Helvetica. The standard fonts and their Windows replacements are listed in the following table.

PDF standard fonts and their replacements

PDF font

Windows font

Word font

Sample

Times

Times New Roman

Times New Roman

Times is a serif font

Helvetica

Arial

Arial

Helvetica is a sans-serif font

Courier

Courier New

—

Courier is a fixed-width font

Symbol

Symbol

—

Symbol Symbol is, well, a symbol font

ZapfDingbats

(ZapfDingbats)

—

ZapfDingbats includes symbols and ornaments

In Word you can safely use Times New Roman and Arial. Your PDF will use Times and Helvetica, the standard fonts, consuming as few bytes as possible.

Unfortunately this is not true for Courier, Symbol or ZapfDingbats. Word will always embed Courier, Courier New, Symbol and ZapfDingbats. It wouldn't be necessary, really, but Word does that. Too bad!

Other PDF writers than Word may well support all the standard fonts, including Courier New, Symbol and ZapfDingbats. By creating the PDF with a PDF printer driver you can possibly get away without embedding Courier, Symbol or ZapfDingbats.

Rule #2: Use fewer fonts

When Times New Roman (Times) and Arial (Helvetica) are not enough, you will end up embedding font data into PDF. This would be just perfect, but it adds a minimum of tens of kilobytes per each font used.

It pays off to use as few fonts as possible. Using just a few fonts will produce visually appealing output too. A good number of font families (font names) is usually 1 to 3 per document. Use one font family for body text, maybe another for headings. A third font family may be in place for image captions or special effects. It is perfectly OK to use just one font family for everything. Overuse of different fonts makes a document look inconsistent. What is more, it bloats the file.

Rule #3: Use fewer font styles

It is important to notice that Regular, Italic, Bold and Bold Italic are different fonts to PDF. Each of them will need to be embedded separately. If you use all the 4 styles, you end up embedding the font data 4 times: the Regular, Italic, Bold and Bold Italic font data.

Use as few styles as possible to keep the file size down. To emphasize text, use either Italic or Bold. Don't mix both. Pick your preference and be consistent. You don't want your documents look like mixed character soup anyway. Readers like a consistent style with few but carefully chosen effects.

Italic, Bold and Bold Italic are expensive ways to emphasize text. Fortunately, there are some free styles too. It doesn't add many bytes to change the font size, write in a different color or add an underline. You can use Small Caps or adjust the letter spacing. To emphasize a block of text, indentation can be used.

Font size. Use different sizes for different heading levels, endnotes and image captions. Heading levels are easier to tell apart when their size difference is 3pt or more.

Color. Use colors wisely. One extra color is enough. Dark color on a white background is easiest to read.

Underline. Use underlining carefully. It is not considered very stylish, really. Underlined words may look like links.

Small Caps are a stylish option. Their readability is not the best, though. Use small caps with short pieces of text. Note: Avoid using a special small caps font. It will consume more bytes, not less, as you need to embed the small caps font data too. In Word you can safely use the Small caps checkbox in the Font dialog. It creates small caps from the current font and won't embed an extra font.

Letter spacing can be adjusted. It is relatively uncommon these days, but still an option.

Indentation emphasizes a paragraph.

When appropriate, use these effects in place of Italic or Bold. They may not always be the style you want, though. It's a size vs. style trade-off.

As a practical example, consider heading styles. Many documents have 3 or 4 levels of headings in a specific heading font. Utilizing different combinations of Italic and Bold for the various levels not only makes the document look inconsistent, but it also adds to the file size. Try varying the font sizes and colors instead. Perhaps you can use a horizontal ruler too. Your document will become stylish and optimized at the same time.

A way to get rid of embedding italic or bold fonts is to use an italic or bold version of either Times New Roman (Times) or Arial (Helvetica).

If the body text is in Garamond, you can emphasize with Times New Roman Italic, for example. You could even use Arial Italic, depending on your taste. This saves you from the need of embedding Garamond Italic.

Alternatively, use the heading font to emphasize within body text. Reusing the same font doesn't add anything to the file size.

Rule #4: Use smaller fonts

Some fonts consume more bytes than others. As an example, embedded Consolas produces a smaller file than embedded Courier New.

Switch fonts to find one that creates a small file. You need to experiment to find a small and stylish font.

Rule #5: Avoid special characters

When writing text using the standard fonts Times New Roman or Arial, it pays off to use PDF standard characters. As long as you use these "safe" characters, you avoid font embedding.

The opposite happens with special, non-standard characters. They will force font embedding. This happens even if the font is Times New Roman or Arial. What exactly counts as a standard or a special character depends on the PDF writer application. Next we will consider the way Word behaves.

Standard characters. With Word 2007/2010, the standard or "safe" characters consist of the ASCII characters and the Unicode block Latin-1 Supplement. These characters are enough to write English and many European languages—mostly. Here are the safe characters:

Special characters. All other characters are special, or "unsafe". They require font embedding.

The above safe character list is quite near to the Windows Western character set (Windows-1252 codepage), but not identical. Some of the Windows-1252 characters are not safe. Unfortunately there are some common and useful characters in this group. The following "unsafe" characters will be embedded:

€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ

This is not a complete list of unsafe characters. All others are unsafe as well.

We will now have a closer look at some of these common but unsafe punctuation and typographic characters. Word embeds them even though they are technically PDF standard characters. Use of the following characters will cause font embedding. Fortunately, there are replacements available.

When you do want to use some of the special characters, try using them in a single font only. This removes the need to embed them several times. As an example, the typographic dashes are identical in a regular and an italic font. Using them in one style only will save a little disk space.

Image optimization

As one might guess, the less images in a document, the smaller the file. Use as few as required.

Vector images work much better than bitmap images. A vector image takes less disk space and produces better quality output, both on the screen and on paper. Vector images draw in the maximum available resolution, while bitmaps come with a preset resolution. Since the resolution of the display and the printer are different, bitmaps are not an ideal choice for PDF.

When you have to use bitmaps, try keeping them as small as possible. Try monochrome bitmaps instead of color bitmaps.

Word options

Considering file size, what are the best options for saving as PDF with Word 2007/2010?

Minimum size (publishing online) should be selected.

Document structure tags for accessibility should be off, unless it is required by readers. This setting is a trade-off between size and accessibility.

Bitmap text when fonts may not be embedded should be unchecked. Bitmapped fonts look bad anyway.

PDF optimizer utilities

There are some utilities for PDF optimization, even free ones. Such utilities often apply compression to the file. This can be a useful additional step. It doesn't do away with the need to deal with the font and image data, though. Therefore, to get the smallest file, follow these optimization rules and then use a PDF optimizer.

Sample PDF files

To prove the point, here are two PDF samples. Both files were produced with Word 2007 with the same settings. No additional applications were used.