The Novartis team recently embarked on a project to bring together huge volumes of disparate biological data, which they are able to search at record speeds thanks to the Neo4j graph database.

In this week’s five-minute interview (conducted at GraphConnect San Francisco), we discuss how the Novartis biomedical research team uses Neo4j to mine huge volumes of biological data to develop the next generation of medicines.

Talk to us a bit about how you use Neo4j at Novartis.

At this research arm of Novartis Pharmaceuticals, we created a large graph of all kinds of heterogeneous biological data, which we’re combining with text mining results. In merging this data together, we’re creating a giant graph to better understand biology and how we can use this scientific knowledge to develop the next generation of medicines.

What made you choose to work with Neo4j?

It’s really about Cypher. It became very easy to adopt once the data was in the database, which — by the way — we don’t use in a transactional mode.

We compile the data to mine internally, and if you have a thought about some information that might be relevant, you can easily formulate the corresponding Cypher query and get your results. That, for us, is really the killer feature of Neo4j.

What have been some of the most surprising or interesting results you’ve encountered while using Neo4j?

There is a huge amount of biological data available, along with incredible data sources. By really bringing all this data together, for the first time, we can say, “I want to find compounds that are similar to this compound that have annotations about this disease.” To have the flexibility to navigate all of these data sources is really powerful.

Knowing everything you know now, is there anything you would do differently?

Initially, we didn’t use the batch importer, which — now that we’re using it — works really well for us.

We’ve been using Neo4j since the beginning of the year, and are still exploring how to use the batch importer to get all of our data into the database. For us, it’s actually faster to re-do the database by doing a bulk data import from CSV files than it would be to issue all of the delete statements that would have been necessary. So, we wasted a little bit of time by not using the batch importer from the beginning.

Is there anything else you’d like to add or say?

For us, this is just the beginning — but we’re already getting some really good results. We’re also going to start testing the scalability of Neo4j pretty soon; right now, we have half a billion relationships in the database, which we’ll easily triple as we add more data. Right now, we’re very happy with the performance and look forward to seeing how it will perform with these added relationships.