Security

Programming Language Format String Vulnerabilities

Are C/C++ the only languages with security vulnerabilities? What about Perl, PHP, Java, Python, and Ruby?

Hal is a Vulnerability Research at CERT. He can be contacted at www.hburch.com.
Robert C. Seacord is Senior Vulnerability Analyst for CERT/CC. He can be reached at rcs@cert.org.

Although not as well known as other vulnerability types such as buffer overflows, format string vulnerabilities have been known to exist in C and C++ programs since at least 1999, when a format string vulnerability was found in AnswerBook2 (cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-1999-1417). Formatted output became a major focus of the security community in June 2000, when a format string vulnerability was discovered in the Washington University ftpd (WU-FTPD) software package (www.kb.cert.org/vuls/id/29823).

But format string vulnerabilities are not limited to programs written in C and C++. Other languages that include format strings include Perl, PHP, Java, Python, and Ruby. While these languages are relatively immune from buffer overflows because they maintain dynamic arrays and strings for programmers, programs written in them may still contain format string vulnerabilities.

Format string vulnerabilities result from including data from an untrusted source, such as a user, in a format string. Format strings are used by input and output routines to specify a conversion between a character string and a set of data values. The following example shows how the C function printf() accepts a format string and a set of values:

printf ("%s Pop: %11d\n",
country, pop);

and produces a string:

United States Pop: 295734134

In the format string, the % begins a conversion specification. This is followed by a set of formatting parameters and the data type. The %s conversion specifier instructs printf() to output a string value (the value passed as an argument). The %11d conversion specifier instructs printf() to output a decimal value (the "d") in an 11-character field. Format strings can be much more complicated, including flags, precisions, length modifiers, and even variable widths specified in parameters.

Directly including user input in a format string lets an attacker inject format specifications into the format string. This is particularly problematic in programming languages that support the relatively unknown %n specification. This unusual specification causes the number of characters successfully written so far to be stored in the integer whose address is given as the argument. If attackers can write data values to memory, they can often leverage that to gain control of the system. Even if the language does not support %n, an attacker may cause the format string to include more specifications than parameters. Depending on what stack protection exists in the language, an attacker may be able to access private data, avoid logging, or crash the program. (Writing exploits for a format string vulnerability is beyond the scope of this work. For a more detailed explanation, see Robert Seacord's Secure Programming in C and C++; Addison-Wesley, 2005.)

Format string vulnerabilities often result from a programmer being unaware that a particular routine takes a format string. For example, you can write:

Unfortunately, the syslog() routine uses its second parameter as a format string. As a result, if an attacker inputs an e-mail of "webmaster%s%s%s%s@example.com", syslog() looks for parameters to interpret the %s conversion specifiers in the format, most likely resulting in the program crashing. A more advanced attack may use %n to gain control of the system.

Another common source of format string vulnerabilities is when you need to write an error to more than one location. For simplicity, you may construct the string using snprintf() and then use one routine to print the message to a log and another routine to output the message to the end user in some way, such as in a message box. If either routine allows for format strings, you must be careful to include the format specification in the call:

fprintf(log, "%s", logmessage);

instead of neglecting it as in the following call:

fprintf(log, logmessage);

The first invocation is the correct one, avoiding a format string vulnerability by specifying that a string (%s) should be outputted and then providing that string. Because the second is shorter and may correspond to how you are thinking about the desired behavior, you may write the statement in this fashion without considering the consequences.

In this article, I explore the potential consequences of format string vulnerabilities in Perl, PHP, Java, Python, and Ruby programs.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!