DocRaptor is a great service that converts HTML to PDF and is also available as a Heroku add-on. The initial integration is straightforward in a Rails app using the docraptor gem.

However, things get complicated in the case of very large PDFs. Recently we encountered a case where the PDF took more than 10 seconds to generate. We naturally considered moving the printing process to a background job.

We split the printing process in multiple steps, and we found out that the DocRaptor step is the most time-intensive:

User makes a request to download report

Our app generates the HTML code of the report

Our app sends the HTML code to DocRaptor

DocRaptor converts to PDF

We serve the PDF back to the user

Initially, we had a controller (reports) with an action (print) that handled the entire process synchronously. To move this to a delayed job:

Step 1

Create a new action on the reports controller called schedule_print. This simply sets up a file name, a path that generates the HTML code and creates a delayed job instance. We are using a ParametrizedJob model in order to issue progress notification to the user while he waits for the delayed job to finish running. We found out this alleviates most of the user pain and is a better alternative than simply waiting.

Step 2

User is redirected to the show page of the newly created job instance. Here, they’ll subscribe to a pusher channel, and, once they successfully subscribe, the job is scheduled for processing in the background. For more information, see our blog post about Pusher. The user is kept on a waiting page:

Step 3

When the job is first in line, we start processing it. This code is the most complicated, as it handles setting up the DocRaptor parameters and getting the HTML content of the reports#print page.

Step 4

The user is then notified, via Pusher, of the state of the document. The user browser will receive messages at each status update (a simple text notification or the fact that the document was completed). JavaScript is responsible for handling these. This part should be straightforward and the end result looks like this:

Users are much happier to see a spinning gear and updates than waiting for a single request to load up. Also, this releases some of the load on our heroku dynos as it’s not necessary to keep them occupied while DocRaptor is working.