Python, technology, Seattle, careers, life, et cetera…

Integrating reCAPTCHA with Django

Background

We didn’t initially build captchas into TrenchMice, because we simply didn’t think they would be necessary.

By September 2006, the site started receiving spam comments. They were the usual gibberish you see in blog spam: Lots of links, garbage words, and bogus e-mail addresses. (Whenever I see this stuff, I shake my head and wonder why script kidz waste their time generating it. Then I remember it’s because, out of the gazillions of spam messages, some recipients click on the links, making spam financially rewarding. And then I get slightly depressed about the average Internet user. But, I digress…)

So we broke down and added a captcha system using PIL and our own algorithms. Our images were simple, as modern captchas go:

But they got the job done, with an acceptable load on our servers.

We decided to upgrade the captcha technology for two reasons.

Lately, we’ve noticed more doorknob-jiggling activity. This hasn’t yet resulted in comment spam, but it indicates the site is getting more attention from spammers. I don’t want to wait until there’s a successful attack to better secure the site.

We’re not interested in developing a core competency in captcha design. The initial captchas were easy to do, but we don’t want to invest time into learning the latest and greatest imaging techniques now that more work is required.

Why reCAPTCHA?

Their system does useful work by correcting OCR text from digitized books. This is rather cool.

They claim excellent system availability for their users, and expect to be in business for years. There’re no indications to the contrary.

If a hacker cracks their images, they promise to respond quickly by tweaking their algorithms. So we won’t have to do much besides add our voice to the, “Please fix this,” thread that would presumably get created in their support newsgroup.

There was nothing to install. (But PyCrypto was needed for Mailhide.) The API looked fairly easy. And it was free.

The captcha_error template variable is "&error=ERROR_CODE" if we’re re-displaying a bad form after a POST. Otherwise, it’s an empty string. I vacillated over moving this into the view’s form class for 30 minutes, but I kept it in the template because:

TrenchMice has a mix of oldforms and newforms, because we agreed to upgrade pages to newforms only if edit them for another reason. (I.e., fixing a bug or changing the form for some other reason.) We haven’t done all of them yet. I didn’t want to procedurally trigger an update of the remain oldforms-based views using captchas; and if I chose to ignore this self-imposed rule, I didn’t want to further burden them with even more code that would have to eventually be updated.

This is a case where the control–presentation distinction wasn’t clear. Forced to choose one or the other, the reCAPTCHA display belongs in presentation, because it’s JavaScript from another site with core algorithms outside of TrenchMice’s control. Of course, the view has the reCAPTCHA API code to evaluate the user’s response.

I also swapped out our PIL-based e-mail address obfuscation for the reCAPTCHA Mailhide API. Recaptcha-client had code for this too, and it was easy to hook up. So easy that I won’t bother writing about it. 🙂

The end result

The reCAPTCHA captchas work great, and the total amount of view and template code decreased. Our simple captchas had small view hacks to handle the case of re-displaying a form that had a good captcha response but a problem in another field. That code, however minor, is now gone. We also had a background script to clean the captcha image file directory — gone. We also had a font directory for the images — gone.

Visually, the styling isn’t completely in keeping with the rest of the page. But it’s perfectly acceptable. We have been displaying a simple blue box with green text, and I can’t claim that was visually wonderful.

Meta

I’m one of the engineers on the reCAPTCHA team. Glad to hear you are happy with reCAPTCHA.

About the styling, did you see http://recaptcha.net/apidocs/captcha/client.html — it should let you customize the look and feel to your hearts content. If you didn’t find that, I’d appreciate if you could shoot us an email (support@recaptcha.net) telling where you looked for this information. We want to make the themeing easier to find (I think it’s a slick api, but I’m sort of biased)

You wrote a nice experience summary.. and enjoyed reading it, since it was in the area of my recent interests. CAPTCHA.

We developed a captcha control for .net2.0 after experiences with other captchas and if you may be interested, you try downloading it from http://www.floresense.com.. I will be happy to have feedback from a person like you using it.

We also have a free, simple captcha service that might be interesting for small sites as well.

Yes, I did read that section. But reCAPTCHA’s styling with the white theme is sufficient for our needs, so we didn’t feel the need to use your custom theme hook.

Our captcha images heretofore haven’t stylistically been anything to write home about. What I didn’t say, but should have clearly said, is that reCAPTCHA is slicker looking than our simple blue rectangle, so it’s actually an improvement.

If we sometime have time to burn (ha!), we’ll look at better styling for it.

I gotta say, the comment by “Harish” is some damn clever spam–and pretty ironic considering this is a post about captchas, of all things.

Notice how he doesn’t actually address the content of the post, merely referring to it as an “experience summary”–terms that could be applied to almost any blog post.

Also note how he includes the link to his website three times. And note that the **product** he is **promoting** has nothing to do with Django–it’s written for a totally different platform.

My guess is “Harish” is really a spam bot programmed to leave comments on blog posts that it infers are discussions regarding captchas.

It’s pretty ironic the double standard most people have regarding promotion. My guess is most folks would feel significantly uncomfortable writing a comment as self-promotional as Harish’s, but when Harish does it we don’t bat an eye.

I have a classified website and i keep getting spam on http://www.inchis.com. now the issue is that i am using a shared hosting platform wich is very hostile to 3rd party. what can i do about that issue.

Yopu know @Harish is not really a spam, just because you mentione your website thrice in a post means you are emphasising, like if i were to mention my site, that is http://www.inchis.com a number of times reffering to a problem, it will not make it spam. think about that. Spam is a web designers nightmare we need solution to