Blocking bad bots with Fail2ban

Fail2ban is a versatile security tool. While it is primarily used for preventing brute-force attacks against SSH, it can also be used for protecting other services.

There are bots which go around scanning the internet and send thousands of requests to web servers in hopes of finding vulnerabilities. This post discusses blocking such bots with Fail2ban.

We assume that you are using Apache as a web server. However, these instructions can be easily adjusted for nginx or any other web server.

However, you should keep in mind that Fail2ban is not a Web Application Firewall (WAF) and cannot fend off malicious requests as they come in through. This is because fail2ban takes actions by monitoring logs; so there must be at least one malicious attempt which gets logged before Fail2ban can take an action.

What is a bad bot, anyway?

In this post, we will focus on blocking bots that do one of the following things:

Fail2ban basics

At the heart of the working mechanism of Fail2ban, there are a set of jails. Put simply, a jail tells Fail2ban to look at a set of logs, and to apply a filter on it each time the log changes. If the number of matches for the filter equals the maximum number of matches allowed by the jail, then an action specified in the jail is taken.

Thus, you need to define two things: a filter, and a jail. The jail will be configured to look at Apache’s logs to detect malicious requests.

Defining the filters

A filter is simply a collection of Python regular expressions that are matched against a log. Here, we’d need to define filters for the criteria we described above.

Notice that the request header GET /robots.txt HTTP/1.1 is enclosed in double quotes. While designing such rules yourself, you should take sufficient care to ensure that only the request header matches. Otherwise, you risk blocking legitimate users.

SQL injection payloads generally contain strings of the form union select(...) or select concat (...). Thus, you could try to match this pattern with the following regular expression:

The <HOST> part defines the location of the IP address in the log entry, and the (?i) states that the regular expression is case insensitive.

The [^"] in the regex ensures that the matched text is enclosed in double quotes. This helps to ensure that the regular expression matches just the request header and nothing else. The (?:%%2[8C]|[,(]) specifies that the union select or select concat matched is followed by a comma(,) or a parenthesis((), either directly or in their percent-encoded form.