HTML to PDF With PhantomJS

2014-10-19 01:15:00 -0400
Oct 19th, 2014

PhantomJS Tricks: HTML to PDF Conversion

PhantomJS is widely known as the brains powering headless javascript
testing. But as a standalone WebKit executable, it also has a screen
capture functionality that can render web pages to PNG or PDF. For
very simple document conversions, PhantomJS is a fairly
straightforward tool. But I warn you, severe headaches will occur with
any conversion of substance: repeating header/footers, images, SVG,
fonts, etc – issues aren’t mentioned in the sparse documentation or
example snippets, and can lead to some serious frustration. Hopefully
I can help save you a few days with these tricks.

How exactly PhantomJS (or WebKit) structures their measurements
internally seems somewhat of a mystery. The easiest way to understand
a PhantomJS document is to consider it containing two types of pixel
lengths. These are likely OS-specific measurements (as measured on an
Ubuntu box.)

Header / Footer : 1 full page = 2010 pixels

Body: 1 full page = 990 pixels.

Meaning, if you were to have a single document consisting of entirely
“header,” it would have a height of 2010 pixels. A document containing
only a “body” is 990 pixels.

Any content that goes in the header or footer, needs to be converted
from real (what you see) pixels to “Phantom Pixels”, at the 2010/990
ratio. For example, do you have a header with a height of 125 pixels?
That will need to be resized to 125 * 2010/990 pixels. But any content
that goes in the body, doesn’t require a size conversion.

I don’t know why these lengths are what they are. I had to iteratively
guess-and-check to find them out.

Page Frame Border

If you want a seamless page frame border, you’ll need to calculate
your pixels for the document, and edit the borders of each component
(header / body / footer) accordingly; The header has zero bottom
border, the body has only visible side borders, and the footer has
zero top border.

Take your margins and header/footer height, and subtract them from
990px. The result is the height of your body.

Margin (top and bottom, or 15px each)=30px,

Header = 150px, Footer = 50px.

990 – (150+50+2*15) = 760px.

You’ll have to render the PDF twice in PhantomJS: on the first render
you’ll be able to calculate the number of pages of the document (A
good way is to get this from the footer callback). Then you can extend
your content wrapper div to the entire height, and render the page
again to have a repeating page frame border for the entire document.

For example, if your document is 4 pages long, the height of your
wrapper div (that which contains all the content of your document)
needs to be 4 * 760px = 3040px. If you don’t do this, you’ll see the
body border prematurely end, and you’ll get whitespace instead of a
frame border.

Of course, you’ll need to fudge the heights of your header/footer +/–
1 pixels to really get it spot on.

Images (Header / Footer)

Images in the header need to be passed as base64-encoded text, and
need to be included in the body, but hidden via a style attribute
‘display:none’. Due to the async nature of phantomjs, when the headers
are rendered, if the image isn’t already ‘cached’/loaded it simply
won’t display.

Think of the header/footer as sandboxed from each other and the rest
of the document. So CSS styles don’t fall over.

SVG and Assets

SVG images need to be placed as base64-encoded text, and best done
through an image tag. Raw SVG works, but isn’t properly “namespaced”
(for lack of a better term that describes what happens), so if you
have multilple SVG charts or graphs, it’s likely the styles will bleed
over. For example, if you add a second graph, the style settings of
the second graph will override the first. Three charts, and now you
have the first two re-styled according to the last SVG.

Local assets (localhost/127.0.0.1) will not be loaded. But File:// uri’s
seem to work.

Fonts are funky. Font weights seem to be ignored, so you’ll have to be
explicit with naming each font instead of treating them properly like
a family.

Hope I saved you some frustration – if anyone has clearer ideas
regarding phantom behavior, I’d most welcome the clarification – thanks!