One of the first things a malware analyst does when encountering a suspicious executable file is to extract the text strings found inside it, because they can provide immediate clues about its purpose. This operation has long been considered safe, but it can actually lead to a system compromise, a security researcher found.

String extraction is typically done using a Linux command-line tool called strings that's part of GNU Binutils, a collection of tools for binary file analysis and manipulation available by default in most Linux distributions.

Google security engineer Michal Zalewski was recently running a type of vulnerability testing known as fuzzing against a library called libbfd (the Binary File Descriptor library) that sits at the core of GNU Binutils and is used for file format parsing. Fuzzing is the act of providing unexpected input to an application like libbfd in order to trigger potentially exploitable behavior.

What Zelewski found was, in his own words, "a range of troubling and likely exploitable out-of-bounds crashes due to very limited range checking." These are the kinds of errors that can lead to arbitrary code execution.

"Many shell users, and certainly most of the people working in computer forensics or other fields of information security, have a habit of running /usr/bin/strings on binary files originating from the Internet," Zalewski said in a blog post in which he documents one such vulnerability. "Their understanding is that the tool simply scans the file for runs of printable characters and dumps them to stdout -- something that is very unlikely to put you at any risk."

According to Zalewski, that's not the case because the strings utility relies on libbfd to optimize the analysis process for supported executable formats. This means an attacker could create a binary file that exploits vulnerabilities in libbfd when analyzed by the strings utility in order to execute arbitrary code on the underlying system.

The problem is made worse by the fact that many Linux distributions ship the strings utility without address space layout randomization (ASLR), a protection mechanism that makes exploiting vulnerabilities harder. This makes potential attacks "easier and more reliable -- a situation reminiscent of one of the recent bugs in bash," Zalewski said.

The impact is not limited to strings. Other GNU Binutils components like objdump and readelf, or even custom tools that leverage libbfd are likely susceptible to similar attacks.

Executing strings against a binary file downloaded from the Internet is not something a regular user would normally do -- at least not without being socially engineered by the attacker. However, the risk is much higher for people whose job it is to analyze hostile files every single day.

"The bottom line is that if you are used to running strings on random files, or depend on any libbfd-based tools for forensic purposes, you should probably change your habits," Zalewski said. "For strings specifically, invoking it with the -a parameter seems to inhibit the use of libbfd. Distro vendors may want to consider making the -a mode default, too."

It's true that most malware researchers and computer forensics specialists analyze suspicious files in controlled environments, on systems specifically set up for this purpose. However, they are also known to make the occasional exception when they need a quick result, especially with such seemingly safe operations as string extraction.

"I'm sure many of us are guilty of running 'strings' on an untrusted file at one point or another outside of our test systems, so this does serve as a reminder that nothing is safe and vulnerabilities can be found in any code," said Carsten Eiram, the chief research officer at vulnerability intelligence firm Risk Based Security, via email.

A compromise is not desirable even when it involves just a dedicated system used for analysis.

"A researcher wouldn't want that system to be probed from the outside," said Bogdan Botezatu, a senior e-threat analyst at antivirus vendor Bitdefender, via email. "An attacker could gain intelligence about the network topology, the tools running on the respective computer or even deny service on that machine. It's mostly intelligence harvesting rather than compromising the organization, but it's still a threat that should be taken into account."

The risk posed by libbfd vulnerabilities also extends beyond the security industry.

"There are various tools that use libbfd, including some debug utilities that extract relevant data from crash dumps," Botezatu said. "They all depend on libbfd, whether these tools are used for forensics or debugging."

Exploitation is also not limited to cases where strings is used manually. There are also automated tools that leverage libbfd-related utilities to analyze samples submitted by other internal systems or directly by users from the Internet.