My development log was spitting out a number of instances of “String#to_html was called without RedCloth being successfully required”. This turned out to be a conflict between Active Admin and Stringex. It turns out that they both declare a to_html method but in the order of loading, Stringex’s version prevails.

In the latest updates in active admin this conflict has been removed, but not so in the latest release. To get the latest updates from the master branch explicitely grab the master branch in your Gemfile:

So PDFkit is totally awesome, but there are a few gotchas to be aware of if you are on a single threaded server (`rails server`) or a read-only filesystem (Heroku).

The first thing to note is that with the introduction of the asset pipeline, assets (images, stylesheets, javascripts) are no longer served up directly but instead are first processed by your rails app and then served. This means that in a single threaded process (`rails server` or Heroku with a single dyno) PDFs generated with an image or external stylesheet will stall and timeout. This is because while the thread is being used to process the PDF, wkhtmltopdf sends another request to fetch the external data and we find ourselves in a deadlock as the two processes wait for each other to complete.

For stylesheets, the simplest solution I found was to have your styles placed inline and perhaps have them conditionally placed with this helpful bit of code:

def request_from_pdfkit?
# when generating a PDF, PDFKit::Middleware will set this flag
request.env["Rack-Middleware-PDFKit"] == "true"
end

You could also try serving a stylesheet directly from the public directory (bypassing the asset pipeline, but I have not tested this.

With images it is a little more tricky. We still cannot use the asset pipeline for reasons already discussed. If we give a full URL such as “http://example.com/image.png&#8221; the server hangs and times out, and if we provide a relative path to an image in the public directory PDFKit doesn’t seem to find it. So what to do?!

It turns out that when wkhtmltopdf encounters an image tag with a relative path, it actually looks for the image at the full UNIX path. So, we create a new image_tag helper that will give the full UNIX path to the file in question:

So you are getting this error when you try to start or server or if you’re in irb and do a “require ‘rubygems'”, but you can clearly see that gem -v works and you can even list your gems with “gem list”… So why can irb find rubygems?!

Well, in my case, it seems ruby was looking for rubygems in the same directory as itself and could not find it there, but did not continue to search in other directories. Namely:

which ruby # => /opt/local/bin

which gem # => /usr/local/bin

The solution, though hard to find, was simple. Use the same directory for gem and ruby. Since I had a version of ruby in /usr/local/bin I simply had to reorder the preference in which to search for binaries putting /opt/local/bin (MacPorts) at the end instead of the beginning. To do this I opened up ~/.bash_profile and changed

export PATH=/opt/local/bin:/opt/local/sbin:/opt/local/var:$PATH
to
export PATH=$PATH:/opt/local/bin:/opt/local/sbin:/opt/local/var
If you are unable to see the difference, in the second line I have $PATH at the beginning instead of at the end.

It is often the case where I wish to do some simple HTML parsing for purposes of finding tags or validations and I am left frustrated with the lack of documentation on how to use what is built into Rails. Sure there are great gems like Nokogiri, but why add a big fat gem to your list of dependencies when it’s all built right in there already.

That is perhaps the most frustrating part about it. I know from testing that Rails definitely has some HTML parsing punch, but where’s the documentation?

Well, I finally did some digging and here’s what I found.

The first trick is converting your HTML document, presumably a String, into an HTML object that Rails can manipulate:

Excellent. Now we can loop through all of the nodes and check if any of them have the class “content”. But first we need to convert the nodes from a string to an HTML object. Tokenizer will loop through HTML tags and text blocks and hand them back to us as Strings so we can parse them: