23 August 2007, Thursday

Core Security anounced their new security solution CORE GRASP. In short it is meant to recognize between “good” and “tainted” (i.e. comming from the user/attacker) data and stop certain functions (like mysql_query()) from working if they are given tainted data which contains dangerous symbols.

I have experimented with similar good/tainted recognition techniques myself (though not at the low level they do it) and am convinced that this is a viable way to help testing for various kinds of injections (SQL injection, XSS, code execution etc). Note the emphasis on testing here, in my view, production environments would benefit from such solutions only if the overhead is significantly lower. On the other hand, if coders use this as a taint-testing tool during development, they can benefit from it right away.

As Steffan Esser already pointed out, currently the overhead of handling the tainted flags is way too big (~30% according to Core). He also points out several problems with the code (which is after all in its infancy yet).

Code problems aside (which can be fixed more or less easily, I’m sure), from a quick read of their paper, I have seen two (potential, I haven’t yet tested them) serious design flaws in this product, which may not be so easy to correct. (Or, may not exist, caveat lector)

First, “tainted data” is not a trivial concept. That $_GET/$_POST/$_COOKIE contain tainted data is obvious, ditto for $_SERVER (although they don’t mention it in the paper, gotta check the source). The trouble is that tainted data can also come from the database, local files and maybe other sources. Since “taintedness” is lost when something is put in the database, an application that relies on GRASP to stop SQL injections for example, will be unprepared for second-order injection (based on data from the database that the attacker inserted in a previous step). It can be argued that the “dangerous” data would be stopped the first time, but this really depends on the implementation of both GRASP and the PHP code being protected . Note that with second-order injection, the first step contains only benign data (for example escaped quotes) that get dangerous only when handled at another step. Also, two benign pieces of data may be combined into an injection string. So either GRASP will be so unforgiving as to stop - say - valid use of quotes in user input, or it will let them in, allowing the second-order injection attempts to go under its radar.

The second trouble is with their implementation of SQL grammar parser (again, according to the paper, not the actual code).They say:

The protection mechanism for injection attacks can be modeled by a
Finite State Machine (FSM for short) which allow a formal representation of
well-formed strings. The FSM evaluates a predicate and then answers true if
the string does not represent an exploit, and false if it does. We can design a
FSM for each kind of vulnerability, allowing a precise per-character analysis
in order to perform security checks detecting vulnerabilities in cross language
boundaries (e.g., SQL inside PHP, Javascript inside HTML, etcetera.)
(… snip …)
The FSM for this protection was based on MySQL’s lexical analyzer.

(Take this with a pinch of salt though, it is long since I last read my textbook in discrete mathematics)
The trouble is that FSMs can only work on regular languages, while SQL is in a recursive language. For example you can do this:
SELECT 1;
SELECT (SELECT 1);
SELECT (SELECT (SELECT 1));
.....

Meanwhile a FSM will only work for a finite number of such recursive steps. True, MySQL itself will too accept a finite number of those, but, it will be a big enough number to make a simulation of such a “finite recursion” with FSMs unfeasible. Thus a possible attack against GRASP’s FSM would be to use a large number of nested parentheses (with or without other SQL tokens) and wait for the FSM to run out of states. (This will happen sooner than later, because if a non-recursive language requries N states, adding just one level of recursion at M points will need NxM states, adding two needs NxM2 states, and so on; it quickly gets out of hand.)