Web Application Defense: Bayesian Attack Analysis

Regular Expressions for Input Validation

If your web application defensive strategy against injection attacks relies solely upon the use of blacklist regular expression for input validation, it is only a matter of time before an attacker finds an evasion. Want proof? Check out our SQL Injection Challenge post mortem. Just to clarify, there is value in using regular expressions for input validation:

Positive Security Model (Whitelisting) - where web application developers (Builders) define what is acceptable input and deny anything that does not match. Examples would be the OWASP Validation Regex Repository.

How long does this trial and error process last? Depends on the skill level of the attacker and if detailed SQL error messages are returned (otherwise it turns into a blind SQL attack which usually takes more time). Here is a quick table listing some time-to-evasion statistics from the SQL Injection Challenge:

Blacklist Filter Evasion Conclusion

Blacklist filtering alone will only slow down determinedattackers

Attackers need to try manypermutations to identify a working filter evasion

Using Bayesian Analysis

Bayesian analysis has achieved great results in Anti-SPAMefforts for email. Why can't we use the same detection logic for HTTP data? Conceptually, we need to look at the HTTP equvalence for using Bayesian analysis:

Data Source

Email – OSlevel text files

HTTP – texttaken directly from HTTP transaction

Data Format

Email – Mimeheaders + Email body

HTTP – URI +Request Headers + Parameters

DataClassification

Non-maliciousHTTP request = HAM

HTTP Attackpayloads = SPAM

Conceptually, we should be able to analyze HTTP request traffic using Bayesian analysis to identify an attack probability. Now we just need to figure out what Bayesian tool to use and how to pass live HTTP data to it!

OSBF-Lua + ModSecurity's Lua API = Win

In order to extend ModSecurity's capabilities, we can use the flexible Lua API to add inspection logic. After some searching on the inter-webs, I was able to find the following Lua packages for Bayesian analysis:

Once you have installed moonfilter.lua, edit the file and remove "local" from the last line so that it looks like the bolded text:

----- Exported configuration variables ----------------------------- Minimum absolute pR a correct classification must get not to -- trigger a reinforcement.threshold = 20-- Number of buckets in the database. The minimum value -- recommended for production is 94321.buckets = 94321-- Maximum text size, 0 means full document (default). A -- reasonable value might be 500000 (half a megabyte).max_text_size = 0-- Minimum probability ratio over the classes a feature must have -- not to be ignored. 1 means ignore nothing (default).min_p_ratio = 1-- Token delimiters, in addition to whitespace. None by default, -- could be set e.g. to ".@:/".delimiters = ""-- Whether text should be wrapped around (by re-appending the -- first 4 tokens after the last).wrap_around = true-- The directory where class database files are stored. Defaults -- to the current working directory (empty string). Note that the -- directory name MUST end in a path separator (typically '/' or -- '\', depending on your OS) in all other cases. Changing this -- value will only affect future calls to the |classes| command; -- it won't change the location of currently active classes.classdir = ""-- The text to classify/train as a string -- can be set explictly -- if desiredtext = nil

This will allow us to pass HTTP payload data directly from ModSecurity. The moonrunner script is very useful to manage your SPAM training files. Here is an example usage for initially creating the HAM/SPAM training DB files:

The bolded lines are the command entered. As you can see, these command create the ham/spam classification files under the normal Apache logging directory. You should make sure that these files have read/write permissions for the Apache user:

Now that we have demonstrated using moonrunner to train/classify data as ham/spam, we next need to hook this into ModSecurity and the OWASP CRS.

Theory of Operation

The theory of operation is that we want regular, non-malicious users to help train our classifiers on HAM data. This is achieved by checking the CRS anomaly score and if it is 0 then we extract payload data and train OSBF's HAM classifier. On the flip-side, if an attacker starts sending SQLi attacks, the OWASP CRS will identify these initial attacks and train OSBF's SPAM classifier. Here is a visual representation:

For initial deployment, it is probably best to comment out the SecRuleScript line until you have let the ham/spam scripts run for awhile and conducted some training. As an example, if a normal user were to submit a non-malicious web form that did not increase the CRS anomaly score, here is what the Lua bayes_train_ham.lua script debug logging would look like:

After training has run for a period of time, you can then enable the SecRuleScript directive to run a classify check against requests that did not trigger any OWASP CRS alerts. The idea is that if the attacker was able to identify a successful evasion method against the regular expressions, the final attack payload would still be similar enough to the previous payloads that were caught and trained that the Bayesian analysis would catch it. For example, if we were to resend the example evasion payload shown at the beginning of the blog post, it would now trigger this Bayesian alert: