The expanded Samate Reference Dataset from NIST helps developers squeeze new bugs out of their code

The US National Institute of Standards and Technology (NIST) expanded its database of software flaws to help developers avoid introducing bugs into their code right from the start.

The Software Assurance Metrics and Tool Evaluation (Samate) Reference Dataset contains examples of software issues that could leave applications vulnerable to attackers. Version 4.0 of Samate, released 22 November, contains 175 broad categories of weaknesses with over 60,000 specific cases, more than doubling the number of categories that were included in the previous release.

Security flaw detection

Samate was launched in 2004 to improve software assurance by making it easier to identify and exclude known issues. The database helps developers test software offerings for known security vulnerabilities before going to market.

“It brings rigour into software assurance, so that the public can be more confident that there are fewer dangerous weaknesses in the software they use,” said Michael Koo, project leader at NIST.

Basic vulnerabilities such as SQL injection and cross-site scripting still account for a majority of security flaws in Web applications and several hacktivists operating under the Anonymous banner managed to compromise several high-profile sites by exploiting those issues this year.

SQL injection, classic buffer overflow and operating system command injection errors were among the top errors highlighted by the SANS Institute in June in its annual list of the top 25 most dangerous software errors.

Security experts have long urged software developers and vendors to bake security into the development lifecycle and scan for common coding errors instead of checking for potential security issues just before going to market. Security has to be part of the design and everyone from the start has to be thinking about security implications, Marisa Viveros, vice president of IBM Security Services, told eWEEK.

The weaknesses might be compared to grammatical errors in a page of writing – errors that inadvertently instruct a computer to do things that leave itself open to cyber attack, said NIST. A number of popular programming languages, included Java, C and C++, are represented.

Specific examples of a coding mistake are listed in the database with a code sample illustrating how a code vulnerability was created by the way the function was written. The database is fully searchable by language, type of weakness and by specific code samples. Developers receive the search results as a downloadable Zip file.

Fatal errors

The “act of checking out software” by making sure it is not vulnerable to cyber-attack has become “so complicated” that developers rely on a static analyser program to help with the checking, NIST said. Static analysers run through the code looking for obvious problems, but they can only find the weaknesses they have been programmed to find. Static analyser vendors can include the expanded Samate database to have a bigger reference set to search against, which would catch more errors, according to NIST.

Samate complements other programmes with similar goals, such as the Common Weakness Enumeration and the Common Vulnerabilities and Exposures (CVE) databases maintained by Mitre. Many companies rely on CVE to identify bugs that it has patched in its software. Samate does not yet cover as many known issues as CVE, which has close to 500 types.

The NIST team plans to improve the dataset by including more errors and support more programming languages, according to Koo. Future versions will also include larger code samples, which is currently about a page long. There are also plans to explore vulnerabilities in large open source software packages of up to a million lines of code and include those issues in the dataset.