url-to-pdf-api

URL to PDF Microservice

⚠️ WARNING ⚠️Don't serve this API publicly to the internet unless you are aware of the
risks. It allows API users to run any JavaScript code inside a Chrome session on the server.
It's fairly easy to expose the contents of files on the server. You have been warned!. See https://github.com/alvarcarto/url-to-pdf-api/issues/12 for background.

⭐️ Features:

Converts any URL or HTML content to a PDF file or an image (PNG/JPEG)

Rendered with Headless Chrome, using Puppeteer. The PDFs should match to the ones generated with a desktop Chrome.

Sensible defaults but everything is configurable.

Single-page app (SPA) support. Waits until all network requests are finished before rendering.

How it works

Local setup is identical except Express API is running on your machine
and requests are direct connections to it.

Good to know

By default, page's @media print CSS rules are ignored. We set Chrome to emulate @media screen to make the default PDFs look more like actual sites. To get results closer to desktop Chrome, add &emulateScreenMedia=false query parameter. See more at Puppeteer API docs.

Chrome is launched with --no-sandbox --disable-setuid-sandbox flags to fix usage in Heroku. See this issue.

Heavy pages may cause Chrome to crash if the server doesn't have enough RAM.

Examples

Note: the demo Heroku app runs on a free dyno which sleep after idle.
A request to sleeping dyno may take even 30 seconds.

API

To understand the API options, it's useful to know how Puppeteer
is internally used by this API. The render code
is really simple, check it out. Render flow:

page.setViewport(options) where options matches viewport.*.

Possiblypage.emulateMedia('screen') if emulateScreenMedia=true is set.

Render url or html.

If url is defined, page.goto(url, options) is called and options match goto.*.
Otherwise page.goto(`data:text/html,${html}`, options) is called where html is taken from request body. This workaround was found from Puppeteer issue.

Possiblypage.waitFor(numOrStr) if e.g. waitFor=1000 is set.

PossiblyScroll the whole page to the end before rendering if e.g. scrollPage=true is set.

Useful if you want to render a page which lazy loads elements.

Render the output

If output is pdf rendering is done with page.pdf(options) where options matches pdf.*.

Else if output is screenshot rendering is done with page.screenshot(options) where options matches screenshot.*.

Development

To get this thing running, you have two options: run it in Heroku, or locally.

The code requires Node 8+ (async, await).

1. Heroku deployment

Scroll this readme up to the Deploy to Heroku -button. Click it and follow
instructions.

WARNING:Heroku dynos have a very low amount of RAM. Rendering heavy pages
may cause Chrome instance to crash inside Heroku dyno. 512MB should be
enough for most real-life use cases such as receipts. Some news sites may need
even 2GB of RAM.