At each step, there are security and consistency concerns that must be observed and understood in order for XMB to remain stable.

There are also many situations where the flow of data is different. For example, user inputs are sometimes needed for immediate output rather than including them in a database query. Output sanitization is often performed at the same time as input sanitization in order to eliminate one step and to put output-safe data in the database. Sometimes data retrieved from the database are needed for immediate use in another database query rather than being transmitted.

Data consistency must be understood in each situation to prevent writing confused code that will work in some situations but not others.

Why We Never Use Raw Input

With nearly zero exceptions, the following variables should never be used in new code for XMB:

1. Raw user input is never safe to use in database queries. Anyone could innocently or maliciously use a character such as an apostrophe, quote, or backslash, which are some of the special characters used in the SQL query language.

2. Raw user input is never safe to transmit as output. No matter what the context is, be it an HTTP header, part of an HTML body (CDATA), an HTML attribute value, javascript string, or even an e-mail message, every output language has a syntax with special characters.

3. Raw user input is subject to change by the PHP environment. In the past, XMB made efforts to ensure that all inputs were "magic quoted" for consistency. However, this caused as many problems as it solved because the debugging process was convoluted and unreliable. There was also a bad practice of using stripslashes() arbitrarily throughout the code base to solve those extra problems, further compounding the matter by dangerously restoring the raw input in some situations.

It was decided during the development of XMB 1.9.8 SP3 that a greater level of consistency could be achieved by always assuming that the input format is unknown until it has been sanitized by one of the new functions in validate.inc.php. Therefore, the validate.inc.php script is always responsible for checking the PHP environment for things like magic quotes.

4. Raw user input is not always supported by PHP's built-in functions. What if the variable does not contain a string data type? What if it does contain a string, but there are NUL bytes embedded in it? Can you afford to worry about these kinds of issues with every variable? Of course not. The data sanitization functions in validate.inc.php are responsible for this, and they offer a central location for tweaking security issues when they are used correctly.

5. Uninitialized variables can be inadvertently included in XMB's template system, creating an XSS vulnerability. It is a bit unfortunate that XMB forces all input variables to be registered globals, but until we can change that it's a great motivation to always initialize (and maybe even unset) every variable that is being used.

General Input Sanitization

Beginning with XMB version 1.9.8 SP3, the preferred method for accessing all string input data is to use function postedVar().

When the default arguments are used, postedVar does several things to the input automatically:

1. All environmental issues are handled. Magic quotes, NUL bytes, and non-string data will be removed to obtain a true, raw value.

2. The raw input value will be output-sanitized for use in any CDATA context. This is the most common format used in XMB because it corresponds to general HTML output.

3. The value will be SQL-sanitized for use in most ~MySQL queries.

Right away, you should realize that the defaults are not always the correct arguments. This is the most common form, yes, but it is only correct when data are flowing into the database to be used later for CDATA output.

In the example above, it was important that the input be sanitized for an attribute context. This is more restrictive than a CDATA context because quotes and slashes are special characters.

There are many combinations of arguments that can be used with postedVar(), and each one has a different purpose. This again underscores the importance of knowing the storage and output contexts of the data being handled, and the convenience of relying on a single function to handle environmental issues.

Database Query Sanitization

Stub

Database String Formats

This document describes the different, and sometimes inconsistent, string types that have been used in the XMB database schema.

History

Although one might assume that the information stored at the database layer of XMB is in a raw format for simplicity or portability, that has actually never been true. In version 1.9.10, the number of string formats has been reduced to two. Prior to that, at least four different formats were in use.

Our current understanding of the historical XMB schema suggests that previous authors attempted to secure the database layer by injecting slashes into the raw data, then filtering all outputs to remove those slashes. This had the effect of sometimes preventing SQL errors when preparing a query with a value that had been retrieved from the database. This also had the effect of preventing any normal use of slashes in forum posts, such as code snippets posted in development conversations.

For example, if someone posted a message body that said,

"Hello world, isn't this nice"

then XMB would convert the raw value to,

"Hello world, isn\'t this nice"

If you were to observe this value at the database layer, or at some transient point before being output, that is exactly what you would see. The confusion arose in how this was originally implemented. At the database layer, the raw value still needed to be escaped, so the query string always looked like,

"Hello world, isn\\\'t this nice"

This was accomplished using magic_quotes_gpc or equivalent logic in XMB's global.inc.php file. Then, before being output, the value retrieved from the database was arbitrarily run through the stripslashes function multiple times. As a result, a raw input of,

"C:\WINDOWS"

would be sent to the database as

"C:\\\\WINDOWS"

retrieved from the database as

"C:\\WINDOWS"

and ultimately output to the user as

"C:WINDOWS"

The inconsistencies within the database itself were only a result of not applying the above strategy to all of the various string fields. The columns that were double-slashed by XMB did have consistent values. So, the data were usable, but there was no documentation to say which fields were double-slashed and which were not. There has been no research yet on old versions of XMB to produce a comprehensive schema. Generally, double-slashed values have been seen in the subject and message fields of posts and U2Us, the description fields of forums, and in many of the members and settings fields.

As if this weren't confusing enough already, the string format was further transformed anywhere members were allowed to post raw HTML. Rather than consistently storing the raw input or the sanitized input, XMB would store either format, depending on the configuration when the input was received. So, a raw input of

"5 is < 6"

was saved as-is if HTML was allowed, or saved as

"5 is &lt; 6"

if HTML was forbidden. This had the effect of allowing stored values to be output without consideration as to their sanitization. This also had the effect of breaking the HTML setting; if the administrator chose to disable HTML on the board, then the input filter became active, but all of the stored values were still being output as raw HTML, to the detriment of security.

The big picture was that any of the following representations of the same string value were likely to exist in the XMB database:

XMB 1.9.10 Schema

As of version 1.9.10, a great deal of effort was made to simplify the schema and make it more consistent. This was done successfully, although imperfectly due to the scope of convolution and lack of documentation for the pre-existing schema.

Generally, there are only two string formats used now: Single-quoted and double-quoted, with all HTML special characters being escaped. To continue the example from above, you should see one of these two representations in the database:

"Bill & Ted's Excellent Adventure"

or

"Bill & Ted\'s Excellent Adventure"

The former, single-quoted format is used in almost all fields.

The latter, double-quoted format is used only in the following fields:

Those six fields were chosen as the most obvious instances of double-quoting in historical versions, and the most problematic for portability. Therefore, the format has been maintained for those fields, but those fields only. The other fields that had been double-quoted (some of which have yet to be identified!) were simply not converted during upgrades. One common complaint about this change is that administrators upgrading to version 1.9.10 or later often find extra slashes appearing in users' signatures or the settings panel, requiring manual edits. The decision to not automatically correct those values during upgrades was made due to the existing inconsistency (upgraded values were double-slashed, new inputs in the meantime were not) and due to the difficulty of identifying which fields were in which format.

Storing all user input in an HTML escaped format has two advantages. This format preserves the original intention of being able to output stored values without regard to sanitization (assuming the context is appropriate). This format also allows the value to be un-escaped as desired at runtime.

There are still some rare exceptions where XMB stores raw strings including HTML, but they are limited to administrative areas.