Introduction

In this article, we propose a way to secure C# programs by enforcing the verification of potentially dangerous data from the outside world through a simple, Ruby-like solution that will allow a developer to "taint" a C# object by encapsulating it into a generic container class that will not allow access to the target object unless an "untaint" method is invoked on the container first, i.e., the object is deemed safe for use in a vulnerable environment. Defining what conditions allows the object to be cleared is left to the discretion of the software engineering practitioner and can be repeated if the data represents a threat to more than one part of a C# program.

Data from the Outside World Considered Harmful

Accepting data entry into an application (or simply using data from outside the application) is a dangerous operation, as an attacker can take advantage of improper handling to take advantage of or penetrate a system. For example, C and C++ programs are vulnerable to stack overflow attacks that take advantage of unsecured array bound functions: if there are no array bound checks, it will be relatively easy to send more data to an array that it expects (and was designed to handle), which can lead to the execution of arbitrary code provided by the attacker.

Applications that use an SQL database for data persistence face a different, yet potentially destructive issue: SQL injection attacks. SQL injection consists of an attacker including a SQL statement instead of or part of a data input that the program expects and requires from the user, such as an username. In this case, this input would most likely be used as a string to complete a predefined (legitimate) SQL statement that will fetch the user from the database. That possibility raises two issues. First, the data provided from an untrusted source has to be identified as a potential threat given its origin. Secondly, clearing untrusted data to be safe has to be done from the perspective of code that will use it, i.e., only the code that will require the untrusted data can know its own weaknesses and if it can use the data safely.

An Example of SQL Injection

This article is not intended, by any means, to provide a thorough description of SQL injection. This section is only meant to be an introduction to that threat in order to understand how our solution works, and thus can be skipped if the reader is familiar with the former. SQL injection can be easily explained through a (classic) example, that we show below. It features a typical C# database broker part of an authentication routine that uses a username provided by the user of an application as part of the discriminant on a Users table.

The SQL statement clearly expects a string that will contain a username. If it is the case, the expected user will be selected and the routine will behave as expected. However, if the attacker suspects that an SQL database is being used, she can provide input that manipulates the query to return a value where no rows should have been selected. A classic approach consists of adding a condition that will always return true such as:

' OR '1'='1' -- '

to the initial query, in order to return all rows from the User table which can lead, depending on how the rest of the method is designed, to returning a well-formed User instance. As we can see in the code below, the string will be appended to the query, resulting in a valid SQL statement. How the final SQL statement will be interpreted is also shown.

SQL injection is certainly not limited to collecting information regarding a database scheme, or getting unintended access into a system. One can also use the data manipulation language (DML) instructions to update and/or delete rows from a table, or even drop a table. Also, it should be noted that injection attacks are by no means restricted to SQL; they can also occur whenever a string is used as part of a system call.

Marking an Object as a Potential Threat

A crucial aspect of handling inputed data is to know if data can be trusted based on the origin of said data.

The literature shows different solutions for tracking the safety status of an object. Languages such as Ruby and Perl implement Taint checking, an elegant wait of keeping track of the level of trust that can be placed into an object. Taint checking is enforced by marking an object to be tainted if it comes from an untrusted source. That status can be transmitted to another object that touches it. In order to use it in an unsecured execution environment, the tainted object has to be analyzed first to make sure it poses no threat and then marked as cleared explicitly by the developer.

In Ruby, taint checking is closely associated to the SAFE-mode level a Ruby program is running in, 0 being the most lenient and 4 being the most paranoid. Explaining the particularities of each level is beyond the scope of this article, but every level over 0 forces explicit taint checking of externally supplied data [1]. Checking if an object is considered tainted can be done through the tainted? method of the Object super-class. While the object is considered tainted, the Ruby script the object is in is forbidden from performing certain operations, depending on the SAFE level.

Ruby makes it is easy to clean a tainted object. Unless the SAFE mode is set to its highest levels, any object can be cleared by invoking the untaint method on it, which takes no parameters. Ruby does not force any preliminary check before that method is invoked, which is left at the discretion of the developer.

Perl provides a relatively simple mechanism to enforce Taint checking called Taint mode. That mode is automatically entered in some circumstances, such as when a Perl program opens a file that the user that executed the program doesn't own [2]. The Taint mode can also be entered explicitly by providing the -T argument at the command line when starting the Perl interpreter. When the Taint mode is entered, the Perl interpreter will stay in that mode for the reminder of the script (ibid.) When Taint mode is on, using tainted data in a way that could be dangerous will trigger an "Insecure dependency" (fatal) error message. A dangerous operation would be, for example, to write to a file which name is in a tainted variable, or, even worse, to execute the content of the variable as a system call.

Unlike Ruby, Perl does not provide an explicit "untaint" method. Untainting is performed through evaluating a regular expression on the tainted data. Resulting matching groups will be considered untainted.

Our Solution

Having recently been mandated with enforcing the safety of an authentication routine in a C# program, we have been disappointed to discover that no such mechanism seems to exists for .NET and C# in particular.

We thus propose to emulate part of the Taint checking solution in C# by using a generic Tainted container class that encapsulates a target object. That class provides methods to check the status of the target (whether it is tainted or not) as well as untainting and tainting it again. The Tainted class is shown below:

The basic idea behind this class is that whenever data is obtained from outside the program, the object that data is kept in has to be encapsulated (thereafter called the target) into an instance of the Tainted class. At that moment, the target is considered tainted. Access to it is only allowed through a publicTarget getter property. If the target has been untainted, the getter will return it, otherwise it will still be considered unsafe to use. In that case, in order to prevent the code that needs the target from being exposed to the threat the target represents, the getter will raise a TaintException and will not return the target.

That condition is a crucial element of this solution. Before it can be freely accessed, the target has to be cleared first. This can seem paradoxal, as the target has to be accessed to be analyzed. The idea is thus to provide the tainted target only to a method designed to verify it. That method (thereafter called the untainter) is provided as a parameter to the untaint method. That untainter's signature has to match the signature of the IsCleanUntaintTreatmentMethod delegate; it receives the target as a parameter, and must return true if the target is safe (and must be declared untainted) or false otherwise.

The untainter(s) then has to be developed, and contains two methods. The first method, IsFreeOfSQLInjectionUntainter, receives the target string and returns true if the string does not contain any of the strings that are generally used in SQL injection attacks. We have found those SQL keywords and characters on a website [3]. The second method just returns true, and is used when there is no need to actually verify a target string, such as when the data is hashed before being used. We can take a look at the code below:

We can now put everything above in an example. The SignIn method shown below receives two tainted strings: a username and a password. We provide the username to the User database broker, which in turn provides the IsFreeOfSQLInjectionUntainter of the StringUntainter class as a parameter to the Untaint method of the taintedUsername parameter. The broker then ensures that the object is no longer flagged as tainted, and raises an SQLInjectionException otherwise that the SignIn method knows how to handle.

Once the username is untainted, the value can be used in an SQL statement to fetch a User from the database. In our example, if the username has been found and a user fetched, we then provide the tainted password to the HashPasswordForSignIn method. Since that method uses a hashing algorithm on the tainted string, it is not susceptible to be attacked through SQL injection and does not need any further analysis. We thus use the NOPUntainter of the StringUntainter class, which untaints the password, and hash the latter.

Points of Interest

In this article, we presented a simple solution to use Taint checking in C#. Our solution decouples the state of the object from the implementation of the cleansing algorithms, which leads to a generic Tainted class. That class could be used with any object, which makes it very reusable, and certainly not limited to primitives, but also to complex objects (such as File objects). The delegate approach used provides type-safety to the implementation of the untainting algorithm, which is an interesting example of the Strategy design pattern. The untainting can then be done from the point of view of the code carrying a risk ; in our case, the database broker "knows" what could represent a threat to itself and is making sure that the tainted username does not. Although we did not illustrate it, the tainted string, once untainted, could have been tainted again by invoking the Taint method should that object been used by some other sensitive routine later. It should also be noted that making the untainting close to the code that uses the sensitive data increases coherency.

We could have selected a more robust approach to untainting that would have imitated Perl by eliminating the possibility of untainting an object and modifying the Target property by converting it to a method that takes a delegate and returns the target object if the delegate returns true. Even if that approach would have reduced the risks of providing an already untainted object to a sensitive segment of code or module, it would have decreased performances by executing the cleansing algorithm every time the target is accessed (if the code making the access is implemented naively). Also, it would not have provided any additional protection than our solution if a NOP cleanser was used.

Our solution, while useful, is certainly not as secure as it would be if Taint checking was integrated natively into .NET. The developer has to know the origin of data and manually encapsulate it into an instance of the Tainted class instead of that operation being enforced by the language itself.

Our authentication example is also voluntarily simple, to make it easier to understand. It is inherently insecure as the username and password are sent on the network in cleartext. More modern approach would hash or encrypt that information to allow them to be safely transmitted over a public or unsecured network.

An opposite approach to Taint checking is called Trademarking [4, p.18]. Trademarking consists of explicitly whitelisting data by keeping a list of objects which have been trademarked (deemed safe) by a ApplyTrademark method [5]. Sensitive code that wishes to use the trademarked code has to make sure that the object has been trademarked through a VerifyTrademark method, which returns true if the object is safe to use (ibid.) A disadvantage of that approach is that it makes it more difficult to ensure that an object that has been deemed safe is indeed safe to use in a particular context, which could be problematic if the data is trademarked in a module and used in another one. We can imagine data that was initially analyzed and considered safe in a module for use in a SQL database context. That data is then provided to a module that will use it as part of a system call. In that situation, even if the object has been trademarked the first context, nothing guarantees that it will be safe for the second one. In our approach, however, the object, after having been untainted for use by the first module, could be tainted explicitly before being provided to the second module, which would eliminate the risk of false negatives.

Comments and Discussions

Would the following solution be better-suited for a statically-typed language?

It seems to me that most code is designed either to deal with tainted data, or untainted data, not both. If that is the case, then the goal is just to make sure that the tainted data is not given to untainted code by accident. A simple and efficient way to express this would be with a "tainted" struct:

Code designed to work with tainted data would use Tainted<T> and access the data through Value. Clean() is used to alter the value to make it safe (e.g. by escaping or removing unsafe characters if T is string); or, alternately, the code designed to handle tainted objects could verify the Value itself and call Cleaned() to indicate that the Value is safe. Calling Cleaned() is no different than simply accessing Value directly; it would just serve as a coding convention that declares "this Value is safe", just as we may use "1.0" instead of "1" to declare "this is floating point".

Or maybe I misunderstand how tainted data should be processed. When is it necessary or useful to have a single piece of code deal with multiple data items, some of which are tainted and some of which are not?

First of all, I did rate your article a 5 because I found it interesting. I am not a Ruby or Perl developer so the concept of tainting/untainting was unfamiliar to me.

That being said, I think a developer would be better served by using parameterized SQL to prevent SQL injection. I think I would like to have seen a more worthy scenario (other than SQL injection) which shows how to implement tainting/untainting within the article.

As the author of that post, I can tell you what I meant by the example being contrived. I used a simple example that could be done without the 'Dynamic SQL' ie. string concatenation. The intent was to show a simple example to convey the concept and potential risk.

There are occasions where dynamically building a SQL is the only or most convenient option, in these cases the developer needs the developer needs to ensure that they have taken the necessary precautions to ensure that they are not vulnerable to SQL injection even though the client code is calling the procedure using a parameterized query.