9.2 Filtering inputs

Filtering user data is one way we can improve the security of our web apps, using it to verify the legitimacy of incoming data. All of the input data is filtered in order to avoid malicious code or data from being mistakenly executed or stored. Most web application vulnerabilities arise form neglecting to filter input data and naively trusting it.

Our introduction to filtering data is divided into three steps:

identifying the data; we need to filter the data to figure out where it originated from

filtering of the data itself; we need to figure out what kind of data we have received

distinguish between filtered (sanitized) and tainted data; after the data has been filtered, we can be assured that it is secure

Identifying data

"Identifying the data" is our first step because most of the time, as mentioned, we don't know where it originates from. Without this knowledge, we would be unable to properly filter it. The data here is provided internally all from non-code data. For example: all data comes from clients, however clients that are users are not the only external sources of data. A database interface providing third party data could also be an external data source.

Data that has been entered by a user is very easy to recognize in Go. We use r.ParseForm after the user POSTs a form to get all of the data inside the r.Form. Other types of input are much harder to identify. For example in r.Headers, many of the elements are often manipulated by the client. It can often be difficult to identify which of these elements have been manipulated by clients, so it's best to consider all of them as having been tainted. The r.Header.Get("Accept-Charset") header field, for instance, is also considered as user input, although these are typically only manipulated by browsers.

Filtering data

If we know the source of the data, we can filter it. Filtering is a bit of a formal use of the term. The process is known by many other terms such as input cleaning, validation and sanitization. Despite the fact that these terms differ somewhat in their meaning, they all refer to the same thing: the process of preventing illegal data from making its way into your applications.

There are many ways to filter data, some of which are less secure than others. The best method is to check whether or not the data itself meets the legal requirements dictated by your application. When attempting to do so, it's very important not to make any attempts at correcting the illegal data; this could allow malicious users to manipulate your validation rules for their own needs, altogether defeating the purpose of filtering the data in the first place. History has proven that attempting to correct invalid data often leads to security vulnerabilities. Let's take a look at an overly simple example for illustration purposes. Suppose that a banking system asks users to supply a secure, 6 digit password. The system validates the length of all passwords. One might naively write a validation rule that corrects passwords of illegal lengths: "If a password is shorter than the legal length, fill in the remaining digits with 0s". This simple rule would allow attackers to guess just the first few digits of a password to successfully gain access to user accounts!

We can use several libraries to help us to filter data:

The strconv package can help us to convert strings input by users into specific types, since r.Forms are maps of string values. Some common string conversions provided by strconv are Atoi, ParseBool, ParseFloat and ParseInt.

Go's strings package contains some filter functions like Trim, ToLower and ToTitle, which can help us to obtain data in a specific formats, according to our needs.

Go's regexp package can be used to handle cases which are more complex in nature, such as determining whether an input is an email address, a birthday, etc.

Filtering incoming data in addition to authentication can be quite effective. Let's add another technique to our repertoire, called whitelisting. Whitelisting is a good way of confirming the legitimacy of incoming data. Using this method, if an error occurs, it can only mean that the incoming data is illegal, and not the opposite. Of course, we don't want to make any mistakes in our whitelist by falsely labelling legitimate data as illegal, but this scenario is much better than illegal data being labeled as legitimate, and thus much more secure.

Distinguishing between filtered and tainted data

If you have completed the above steps, the job of filtering data has basically been completed. However when writing web applications, we also need to distinguish between filtered and tainted data because doing so can guarantee the integrity of our data filtering process without affecting the input data. Let's put all of our filtered data into a global map variable called CleanMap. Then, two important steps are required to prevent contamination via data injection:

Each request must initialize CleanMap as an empty map.

Prevent variables from external data sources named CleanMap from being introduced into the app.

In dealing with this type of form, it can be very easy to make the mistake of thinking that users will only be able to submit one of the three select options. In fact, POST operations can easily be simulated by attackers. For example, by submitting the same form with name = attack, a malicious user could introduce illegal data into our system. We can use a simple whitelist to counter these types of attacks:

The above code initializes a CleanMap variable, and a name is only assigned after checking it against an internal whitelist of legitimate values (astaxie, herry and marry in this case). We store the data in the CleanMap instance so you can be sure that CleanMap["name"] holds a validated value. Any code wishing to access this value can then freely do so. We can also add an additional else statement to the above if whitelist for dealing with illegal data, a possibility being that the form was displayed with an error. Do not try to be too accommodating though, or you run the risk of accidentally contaminating your CleanMap.

The above method for filtering data against a set of known, legitimate values is very effective. There is another method for checking whether or not incoming data consists of legal characters using regexp, however this would be ineffectual in the above case where we require that the name be an option from the select. For example, you may require that user names only consist of letters and numbers:

Summary

Data filtering plays a vital role in the security of modern web applications. Most security vulnerabilities are the result of improperly filtering data or neglecting to properly validate it. Because the previous section dealt with CSRF attacks and the next two will be introducing XSS attacks and SQL injection, there was no natural segue into dealing with a topic as important as data sanitization, so in this section, we paid special attention to it.