FDA soups up open data with research-friendly openFDA

By Stephanie Kanowitz

Jun 11, 2014

The Presidential Executive Order on Open Data pushed many federal agencies to develop open data projects, so when the Food and Drug Administration officially launched openFDA on June 2, in one way, it was just another program to add to the list. However, openFDA raised the bar for open data offerings, with its search-based application programming interface, which lets researchers type in queries and get relevancy-ranked results.

“We’re taking what we call a ‘search approach’ rather than, ‘OK, I’m going to put my data out there and it’s up to you now to take it from that level,’” said Taha Kass-Hout, chief health informatics officer at FDA. “FDA has already been doing that for decades. There’s nothing new about that.” This project goes beyond just the data, he said, “it’s about building community and open[sourcing] everything that we’ve done.”

According to Kass-Hout, openFDA offers a “scalable platform that can be easily searched and queried across many distinct datasets, can be easily redeployed or altered to fit a variety of purposes and provides an innovative public data search and analytics solution.”

Here’s the difference: The FDA mapped the data set to drug identifiers, ingredients and other details so users can find what they’re looking for by typing drug names, QR or UPC codes or even reaction symptoms. Misspellings will likely still return an accurate result because each query is given a score that is similar to how search engines operate.

FDA enhanced the data set even further by adding hooks to other sources, such as the National Institutes of Health and MedinePlus. That means doctors can link electronic medical records to the adverse event data, for instance.

“Traditionally, the drug adverse event reports had been made available by FDA, but unless you were a member of industry or had a lot of experience working with that data set, it was very difficult to use,” said Sean Herron, a Presidential Innovation Fellow who worked on the openFDA project. “There wasn’t a lot of easy-to-understand documentation around it, and you had to download all 3.8 million reports in order to get a single one of them. With openFDA, we’ve dived very deeply into that data set and cleaned up some of the most common pitfalls of it.”

OpenFDA is hosted on an agency-approved public cloud. FDA chose cloud to address factors of cost and scalability, Kass-Hout said. “As more and more of the communities grow larger, we don’t have to anticipate building a data center,” he said. “It’s just a matter of scaling to the need.”

Two applications have already been rolled out since openFDA launched this month.

Social Health Insights, an Indiana-based business, used the openFDA API as a simple query interface that lets people search for adverse drug effects by date and location. Researchers can also specify one or multiple drug names or adverse reactions, and results can be broken down by gender.

The other app was developed by Epidemico, a Boston startup that looks at adverse drug reactions reported in social media outlets and matches them to official reports.

Before openFDA, an application like that would have taken years to build because developers would have to know every data set and where to find it, Kass-Hout said. Then they would have to download and stitch it. “The applications were done literally over a couple of nights,” he said.

Reaction to the FDA’s upgraded data sets in the developer community has been positive. Kin Lane wrote on his blog API Evangelist that openFDA’s launch was successful in terms of the technology as well as being collaborative, responsive and explaining use cases well.

“As I’ve said a thousand times before, it doesn’t matter how technically perfect your API is, without the business and politics of your API dialed in, you will fumble the ball,” Lane wrote.

FDA plans to add more data sets throughout the summer that will target other adverse effects, product labels and product recalls. Other agencies with open data projects and APIs include NASA (data.nasa.gov) and the Census Bureau (www.census.gov/developers). Many more federal agencies’ open data projects are listed on Data.gov.