CAPTCHA-less Security: Accessibility of CAPTCHA

CAPTCHA is perceived as a quick and effective way to stop bots from performing abusive actions on a website. But how accessible are the codes for persons with disabilities, finds out Karl Groves.

Bots are often deployed to do things like automatically enter spam into email forms or comment forms. They can also be used to submit fraudulent entries in other forms such as registration forms or to voting forms. CAPTCHA works by presenting a challenge to the user (typically in the form of an image containing jumbled-up letters) which must be solved to proceed in the interaction flow.

On the surface, CAPTCHA seems perfect because bots only have access to that which is in the document source. Text within images cannot be seen by an internet bot and therefore the bot cannot submit a response to the challenge. This is also why CAPTCHA is an accessibility problem. Requiring vision to solve the CAPTCHA locks out all persons who are blind. Lest we think only persons who are blind are impacted, they can often also lock out low-vision users and those with dyslexia – particularly when there’s a lot of “noise” in the image.

Some have attempted to create alternate versions of the typical image CAPTCHA, such as the well-known reCAPTCHA which combines audio with the image. In nearly all cases, some problems with accessibility still remain. For instance, reCPATCHA is still inaccessible to the 45000-50000 Deaf-blind persons in the United States.

CAPTCHA is also not as effective as some may believe. Automated means of beating CAPTCHA have around since 2003. As CAPTCHA techniques advance, so do the means of beating them. There are even services which will employ humans to beat CAPTCHAs.

Keep this in mind at all times when considering CAPTCHA or any other security approaches on your site: The level of effort expended at abusing a system is directly proportional to the perceived benefit gained by the abuser. This applies to the recommendations I make below as well. CAPTCHA is, in many cases, very effective. Otherwise websites wouldn’t use it. But it does lock real people out of your site and it can be beaten. For those reasons, I’d like to discuss some approaches of thwarting website abuse without CAPTCHA.

CAPTCHA-less Security Approaches

Because all of the code for all of my sites (except this one, ironically) is home grown, I’ve developed my own code to handle security as well. This has its advantages and disadvantages, primarily because it took a long time of learning (some of which painful, to be honest) for me to get my code where it is today, but I’m proud to say that using the below approaches, I’ve wholly eliminated all spam and fraudulent registrations on my sites that use this code. Keep in mind, the more attractive a site is for abuse, the more that abusive users will try to find exploits. As I said earlier, in certain scenarios even humans can be employed to simply overcome whatever automated methods you have in place to fight abuse. Here’s what I’ve used with success.

Filter, Validate, Escape

Not directly related to CAPTCHA is the need to filter, validate, and escape all input. This is something every developer should be doing at all times when developing systems which utilize forms. This is something that could take up several postings related to security. Instead, I’d like to point you to Chris Shiflett and encourage you to read his articles, blog, and the books he’s written on this topic. I’ll go over some of these topics here and encourage you to check Chris Shiflett’s work out for more details

Filter all input

Input filtering is the method by which you validate all incoming data and prevent any invalid data from being used by your application. It’s very similar in theory to how water filtering works, where impurities in water are not allowed to pass Chris Shifflett.

In my approach, I filter all input from superglobals. Any key from a superglobal array that I do not expect is automatically removed. For instance, if I’m only expecting ‘id’ from $_GET then that is the only key that is kept. Furthermore, I strip out any input I consider out of bounds for the type of content expected. For instance, if I’m expecting a number for the value of ‘id’, then all non-numeric characters are stripped. If I’m expecting alphanumerics, than anything not a letter or a number is stripped, etc.

Validate everything

Validate strongly to ensure that input adheres to very specific constraints. In the PHP forms class I created for use on all my sites, I have 48 different types of validation ranging from simple string length validation to rather involved regular expressions. The type of validation in use in the final implementation depends on the type of expected input, but everything is validated in some way after input filtering. Even if a field isn’t required, it still gets filtered and validated against various rules meant to prevent abuse. For some of the validation rules, the user is permanently blocked from access as soon as a submission fails.

Escape Output

In this process, any input is escaped to prevent SQL injection, XSS, mail header injection, and so on. Upon accepting submission, most of these things are validated against rather strong rules in the first place. Any signs of abuse result in immediate banning of the offender. Still, all submissions are escaped and submissions stored in a database use prepared statements. On the way out, content is escaped as well. This extra step may be seen as redundant, but helps act as an added protection (in this case, protecting the user in case previous steps were inadequate).

About the author: Karl Groves has nearly a decade of professional experience in web design/ development and more than six years experience doing usability and accessibility consulting for some of the largest eCommerce & software companies in the world as well as some of the largest US Federal agencies. Get in touch with Karl Groves.