December 2010

12/28/2010

The preferred approach to validating input is to constrain what you allow from the beginning. It is much easier to validate data for known valid types, patterns, and ranges than it is to validate data by looking for known bad characters. When you design your application, you know what your application expects. The range of valid data is generally a more finite set than potentially malicious input. However, for defense in depth you may also want to reject known bad input and then sanitize the input.

To create an effective input validation strategy, be aware of the following approaches and their tradeoffs:

Constrain input.

Validate data for type, length, format, and range.

Reject known bad input.

Sanitize input.

Constrain Input

Constraining input is about allowing good data. This is the preferred approach. The idea here is to define a filter of acceptable input by using type, length, format, and range. Define what is acceptable input for your application fields and enforce it. Reject everything else as bad data.

Constraining input may involve setting character sets on the server so that you can establish the canonical form of the input in a localized way.

Validate Data for Type, Length, Format, and Range

Use strong type checking on input data wherever possible, for example, in the classes used to manipulate and process the input data and in data access routines. For example, use parameterized stored procedures for data access to benefit from strong type checking of input fields.

String fields should also be length checked and in many cases checked for appropriate format. For example, ZIP codes, social security numbers, and so on have well defined formats that can be validated using regular expressions. Thorough checking is not only good programming practice; it makes it more difficult for an attacker to exploit your code. The attacker may get through your type check, but the length check may make executing his favorite attack more difficult.

Reject Known Bad Input

Deny "bad" data; although do not rely completely on this approach. This approach is generally less effective than using the "allow" approach described earlier and it is best used in combination. To deny bad data assumes your application knows all the variations of malicious input. Remember that there are multiple ways to represent characters. This is another reason why "allow" is the preferred approach.

While useful for applications that are already deployed and when you cannot afford to make significant changes, the "deny" approach is not as robust as the "allow" approach because bad data, such as patterns that can be used to identify common attacks, do not remain constant. Valid data remains constant while the range of bad data may change over time.

Sanitize Input

Sanitizing is about making potentially malicious data safe. It can be helpful when the range of input that is allowed cannot guarantee that the input is safe. This includes anything from stripping a null from the end of a user-supplied string to escaping out values so they are treated as literals.

Another common example of sanitizing input in Web applications is using URL encoding or HTML encoding to wrap data and treat it as literal text rather than executable script. HtmlEncode methods escape out HTML characters, and UrlEncode methods encode a URL so that it is a valid URI request.

In Practice

The following are examples applied to common input fields, using the preceding approaches:

Last Name field. This is a good example where constraining input is appropriate In this case, you might allow string data in the range ASCII A-Z and a-z, and also hyphens and curly apostrophes (curly apostrophes have no significance to SQL) to handle names such as O'Dell. You would also limit the length to your longest expected value.

Quantity field. This is another case where constraining input works well. In this example, you might use a simple type and range restriction. For example, the input data may need to be a positive integer between 0 and 1000.

Free-text field. Examples include comment fields on discussion boards. In this case, you might allow letters and spaces, and also common characters such as apostrophes, commas, and hyphens. The set that is allowed does not include less than and greater than signs, brackets, and braces.

Some applications might allow users to mark up their text using a finite set of script characters, such as bold "<b>", italic "<i>", or even include a link to their favorite URL. In the case of a URL, your validation should encode the value so that it is treated as a URL.

In an ideal scenario, an application checks for acceptable input for each field or entry point. However, if you have an existing Web application that does not validate user input, you need a stopgap approach to mitigate risk until you can improve your application's input validation strategy. While neither of the following approaches ensures safe handling of input, because that is dependent on where the input comes from and how it is used in your application, they are in practice today as quick fixes for short-term security improvement:

HTML-encoding and URL-encoding user input when writing back to the client. In this case, the assumption is that no input is treated as HTML and all output is written back in a protected form. This is sanitization in action.

Rejecting malicious script characters. This is a case of rejecting known bad input. In this case, a configurable set of malicious characters is used to reject the input. As described earlier, the problem with this approach is that bad data is a matter of context.

12/21/2010

It is important that configuration management functionality is accessible only by authorized operators and administrators. A key part is to enforce strong authentication over your administration interfaces, for example, by using certificates.

What to Do

Examine how the administration interfaces are secured.

Why

The consequences of a security breach to an administration interface can be severe, because the attacker frequently ends up running with administrator privileges and has direct access to the entire site.

When

If your design specifies remote administration, then you must secure the administration interfaces because of the sensitive nature of the operations and the data that is accessible over the administration interface.

How

It is important that configuration management functionality is accessible only by authorized operators and administrators. A key part is to enforce strong authentication over your administration interfaces, for example, by using certificates.

If possible, limit or avoid the use of remote administration and require administrators to log on locally. If you need to support remote administration, use encrypted channels, for example, with SSL or VPN technology, because of the sensitive nature of the data passed over administrative interfaces. Also consider limiting remote administration to computers on the internal network by using IPSec policies, to further reduce risk.

Review the following aspects of your remote administration design:

Do you use strong authentication?

All administration interface users should be required to authenticate. Use strong authentication, such as Windows or client-certificate authentication.

Do you encrypt the network traffic?

Use encrypted communication channels, such as those provided by IPSec or virtual private network (VPN) connections. Do not support remote administration over insecure channels. IPSec allows you to limit the identity and number of client machines that can be used to administer the server.

Last week, I grumbled over the fact that a student can graduate with a Computer Science and Software Engineering degree and had zero exposure to software security . Huh? Doesn’t our society and business literally run on software?

We have plenty of decent (not fantastic, but decent) standards and guidelines on which to build a certification program that can plug the application security hole at Universities; in fact, most of the standards (from NIST, DHS, MS SDL, OWASP, etc.) could lend themselves to several, based on either role or “project” (to borrow the OWASP lingo.)

We need to accept that (ISC)^2’s attempt at replicating CISSP for software with CSSLP is a failure. The test is a joke and the training/prep content overlaps with CISSP to a frightening level, watering down the value of CSSLP and, frankly, endangering the sanctity of CISSP in the process.

We need an organization to step up and sponsor an AppSec Cert program. Successful certification programs need three critical elements:

A sponsor that has market reach/penetration;

A body of content against which to construct training and exams; and,

The infrastructure from which to deliver and support the program.

OWASP could sponsor this. So could Microsoft or IBM – they own the lion’s share of the software development market (and don’t they each have a bunch of cert programs already?). An independent org like (ISC)^2 won’t be successful without a sponsor, imo. Finally, the infrastructure for such a program is key. What the PCI Security Standards Council has done with their QSA and PA-QSA audit certifications is a great model – and now they are moving the programs to eLearning for scale and efficiency. Amazingly, this group has got it right and is a model for us to follow in the AppSec world at large.

Who will be the organization to take a risk (albeit small) to sponsor a program that could have global impact on the single biggest problem area facing IT Security – the software level?

12/17/2010

For years everyone from Mary Ann Davidson (CSO or Oracle) to OWASP to DHS (in their “Build Security In” initiative with SEI) have been bemoaning the fact that our universities do not adequately train software engineering and computer science students on secure coding practices (and in most cases not at all.) Even I have written and presented on the topic, calling for better training and awareness and complaining that industry shouldn’t have to bear the burden of educating software engineers on security. Well, I was wrong.

There are a few universities who are now starting to include security courses in their degree or certificate programs; however, that will take a very long time to propagate throughout industry and the penetration of such courses is still very small. Mary Ann Davidson even offered to give preferential hiring treatment at Oracle to schools who demonstrated security training as part of their Computer Science and Software Engineer programs -- and got a pathetically weak response. Go figure.

I have never been a big fan of personal certifications; however, it is a model that works. Individuals like to own them, and companies like to hire employees who possess them. Cisco’s certifications for networking professionals and the now seemingly omnipresent CISSP ensures at least a minimum level of expertise in security disciplines. However, we still lack a practical and meaningful certification for anything related to application security.

Let me chew on this on a bit and get back to you when I have more concrete thoughts on remediation.

When developing an application, it is best to define security objectives and requirements early in the process. Security objectives are goals and constraints that affect the confidentiality, integrity, and availability of your data and application.

Identification of security objectives is the first step you can take to help ensure the security of your application, and it is also one of the most important steps. The objectives, once created, can be used to direct all the subsequent security activities that you perform. Security objectives do not remain static, but are influenced by later design and implementation activities.

Security objectives should be identified as early in the development process as possible, ideally in the requirements and analysis phase. The objectives, once created, can be used to direct all the subsequent security activities that you perform. Security objectives do not remain static, but are influenced by later design and implementation activities.

Identifying security objectives is an iterative process that is initially driven by an examination of the application’s requirements and usage scenarios. By the end of the requirements and analysis phase, you should have a first set of objectives that are not yet tied to design or implementation details. During the design phase, additional objectives will surface that are specific to the application architecture and design. During the implementation phase, you may discover a few additional objectives based upon specific technology or implementation choices that have an impact on overall application security.

Each evolution of the security objectives will affect other security activities. You should review the threat model, architecture and design review guidelines, and general code review guidelines when your security objectives change.

Use the following techniques to help you discover security objectives:

Roles Matrix. When an application supports multiple roles it is important to understand what each role should be allowed to do. This can be accomplished with a roles matrix that contains privileges in rows and roles in columns. Once the roles matrix has been created, you can generate security objectives to ensure the integrity of the application’s roles mechanism. Many systems have multiple roles and privileges can be assigned flexibly to any role. In this case your objectives need to be more general.

Derive From Functional Requirements. You can generate security objectives by examining every functional requirement in your application through the lens of confidentiality, integrity, and availability (CIA). This provides a very effective mechanism for generating security objectives based on known application characteristics.

12/16/2010

Write a class or library dedicated to error handling. Centralized error handling is easier to test and implement correctly. Handling errors is important for security, so better error handling improves security. Perform the following actions to implement centralized error handling:

Identify possible types of errors. Review your application design to identify possible errors that you don't already have error handling code for. This information will be used to write new error handling code.

Use a global exception handler. Write a handler for exceptions that are not handled by any other code specifically. Use the global exception handler throughout all code to catch exceptions when no other exception handler catches them.

Use centralized error handling. When adding new functionality to the program, use the centralized error handling subsystem to handle errors and exceptions.

12/15/2010

What to Do

Applications should assume that all of their input is malicious, and take action accordingly. Input should be validated and either rejected or sanitized immediately, carefully quarantined during use, and encoded appropriately on output.

Why

Malicious input is the single largest cause of vulnerabilities in web applications, and in the most general sense, is the root cause of almost every issue. The only way to ensure safety is by a defense in depth, default deny policy that starts with the fundamental supposition that all input is malicious until proven otherwise. For example, if you call an external Web service that returns strings, how do you know that malicious commands are not present? Also, if several applications write to a shared database, when you read data, how do you know whether it is safe?

When

All applications should assume that all their input is malicious.

How

Getting input validation correct is tricky; there's a reason that it's the number one security problem for web applications. However, when approached systematically, it's not too hard of a problem to solve. Follow these steps:

1. Determine all inputs

The first step is to determine all the things in the application which can be controlled by the user. There are some surprises here -- a lot of the variables in a normal HTTP server environment are actually taken from the user's request, so make sure you know exactly where everything is coming from. It's a good idea to leave a brief comment in the code where the input comes in mentioning where it comes from (if it isn't obvious from context), the expected format, and where it's validated (again, if it isn't obvious).

2. Determine all trusted data stores

Every application has at least one, usually multiple data stores. It's important to know when a data store can be trusted. The guideline here is simple; if the system in question is the only input into the data store, then you can rely on the semantics enforced by your input validation routines to apply to all data found in the store. If other applications access the data store, then you can't. While it is possible to check the semantics of every validation routine in every other system that accesses the data store, it's simpler and safer to assume that the data store is untrusted, and treat it as a potential source of malicious data, validating all input from it as you would any other input.

3. Determine all crossover points

Crossover points are one of the places where malicious input becomes a bug. They're not necessarily places where output occurs; in fact, they'll often occur many layers further in than that in large applications. A crossover point is anywhere where user input is included textually in some larger body of command text, or where a security-relevant decision is made based on it. A good example of a crossover point is a dynamic SQL query. The risk here is of the user input crossing over into the associated command data, allowing an attacker to execute commands. Xpath and other XML injections are another example here. The worst case here is when user input is evaluated by a languages built-in "eval" command or something similar -- these commands should never be used, even with values that look safe, because of the risks associated.

Once the crossover points are found, all inputs should be traced back to make sure that they've been validated appropriately beforehand, and a comment again stating the format, source, and validation point should be made. All crossover points have, depending on the technology involved, different sets of safe characters. Using the whitelist approach described below, the safe set of characters for that crossover point should be compared against what the validator will allow through; the allowed characters must be a subset of the safe ones.

Whenever possible, steps should be taken to remove crossover points entirely. Switching from dynamic SQL to stored procedures with bound parameters removes an entire category of crossover points from the system, and greatly reduces risk to an entire class of attacks. Similar things can be done with other types of crossovers.

4. Determine all outputs

The last point of concern is the list of outputs from the system. This will likely have a certain amount of overlap with the list of crossover points, which is fine. Again, we need to determine the allowable format for each output, and look at where the incoming data is being validated. If there's any question of whether the data may contain dangerous characters, it should be encoded in a manner appropriate to the specific output. There are more output contexts than one might thing; the contents of HTML attributes, the tags themselves, free text between the tags, and javascript strings all have different safe sets of characters (and a different encoding, in the last case). Comments on the input source, format, validation point, and encoding point are also useful here.

5. Build a centralized validation module

One of the biggest dangers of implementing input validation is inconsistent validation; an attack may be caught on one data path, but not on another. An attacker will try all of them, however. The way to solve this problem is to have a single point of responsibility for input validation. Where this is depends on the design. If every piece of input is an object, then it may be appropriate to have the object constructs and setters perform the validation for that object's input. In a less strictly OO system, a single module with methods for each different input format may be more appropriate.

Which ever method is chosen, the input validation routine for a specific data type should be as strict as possible. For example, when validating a US zip code, allow either 5 or 9 numbers, and nothing else. If you're dealing with international postal codes, either validate them seperately with a looser format that also allows letters, or build a more complex validator that understands the postal codes of each nation, if you need to ensure a higher level of integrity.

6. Build a centralized encoding module

In an ideal world, all encoding routines would be done via the same libraries which are used to create output. While many HTML control libraries attempt this, none of them take the whitelist approach. Instead, they try to guess which characters might be harmful, a list which is categorically incomplete. Unless you want to build a new output library (which might be an option on a large enough application), you should build a set of data encoders for each ouput context which you have. These encoders should be used as close as possible to the actual point of output; this minimizes the chance of an alternate data path skipping the encoding, and ensures that the developer knows exactly what context the output is being used in. Avoid the temptation to store encoded data, because even if it is initially only used in the context you encoded it for, this may change over time.

7. Ensure that all paths through the system preserve validation

Once the validation system is complete, all the paths that data takes through the system should be checked to ensure that they preserve the validation properties that are expected. Input which is sent round-trip through a client or another system must be re-validated, unless a cryptographic signature is used to ensure that it has not been tampered with. Validation which occurs on an untrusted system must also be repeated. Client-side validation in javascript is a nice UI touch, but it is trivially circumvented as a security measure.

12/14/2010

Centralizing input validation helps ensure that data is validated in a consistent way throughout the application and provides a single point of maintenance. Perform the following steps to assure that all input is validated:

Centralize validation. When you develop an input- and data-validation architecture for your application, consider developing a library of validation routines in all but the smallest applications. This will help ensure that data is validated in a consistent way throughout the application and provide a single point of maintenance. You need to trace data from entry point to exit point to know how it should be validated. A good library includes routines for all of the different types of validation you need to apply, and these can be used in combination if necessary.

Constrain, reject, and sanitize input. Constrain what you allow from the beginning. It is much easier to validate data for known valid types, patterns, and ranges (using a white list) than it is to validate data by looking for known bad characters (using a black list). When you design your application, you know what your application expects. The range of valid data is generally a more finite set than the range of potentially malicious input. However, for added defense you might want to reject known bad input and then sanitize the input. Constrain input for type, length, format, and range. Use regular expressions to help constrain text input. Use strong data typing where possible.

Identify trust boundaries. Ensure that entry points between trust boundaries validate all input data explicitly. Make no assumptions about the data. The only exception is inside a routine that you know can only be called by other routines within the same trust boundary.