How big data is taking on breast cancer — and big biotech

Radiation and chemotherapy aren’t the only tools the medical world is using in the fight against breast cancer. To better understand the genetic mutations associated with the cancer, researchers are increasingly turning to another field: data science.

Last month, the Supreme Court ruled that no one can patent human genes. The decision means that the company on the losing end, Myriad Genetics (s MYGN), is no longer allowed to patent the so-called BRCA1 and BRCA2 genes, which can indicate breast and ovarian cancer risk if mutations in either gene are found. But even though it lost, the company is not required to release the years of genetic data and interpretations it’s amassed that could help doctors in diagnosing and treating patients.

Advertisement

So a group of policy makers, advocacy organizations and academic institutions has launched a new initiative called Free the Data! Its goal is to develop a public database of genetic information that doctors, patients and researchers can use to help women with genetic mutations better understand their risk – without relying on Myriad.

On Tuesday, it got a high-tech assist from Syapse, a big data startup that is trying to make it easier for researchers and doctors to mine complicated and dense data sets of genomic information.

Fighting private ownership of genome data interpretations

“A lot of what’s going on today is that there are companies or entities trying to own the interpretation of genome data. A company will discover that this gene is associated with this disease or some derivative thereof and they’ll keep it private. And that’s a huge problem for human health and figuring out how to diagnose and treat patients,” said Syapse founder and president Jonathan Hirsch. The company didn’t disclose the details of its financial arrangement with Free the Data! but said it relaxed its typical software-as-a-service terms for the non-profit.

Most efforts at crowdsourcing the interpretation of the genome have just had “bad software” he said. They either rely on basic spreadsheets, wikis or, if they’re lucky, relational databases. But when it comes to sifting through mounds of genomic data, clinical data and patient information – and then working to draw correlations and connections between the information – those kinds of systems are often insufficient.

Syapse, which my colleague Derrick Harris has likened to a “Salesforce (s CRM) for our genomes,” ingests the reports on patients’ genetic information and turns it into structured data. Then, using semantic analysis and other processing methods, it organizes and clusters it according to concepts and other categories to help with searches and data mining. It also annotates the genetic information with relevant patient information, including their medical histories, pathogenic outcomes, treatment paths and responsiveness to drugs.