Microsoft Builds Open-Source Tool for Biologists Drowning in Data, an ‘On-Ramp’ for Customers Who Pay

The boss at the world’s leading maker of high-speed gene sequencing instruments, Illumina CEO Jay Flatley, pulled no punches a couple weeks ago when asked about bioinformatics. This is the field in which people make software to help biologists store, analyze, and visualize vast piles of genomic data that are accumulating every day.

“If you look historically at bioinformatics companies, it’s road kill. There are almost no examples of very successful bioinformatics companies. People don’t want to pay for software,” Flatley said.

So while Illumina has built a company worth $5 billion on genetic analysis instruments, what does Microsoft want to do with software to manage data from those tools? The world’s biggest software maker (NASDAQ: MSFT) has been working for years to find an angle into biological research labs, without much to show for it. Labs around the world still use an aging spreadsheet program, Excel, to hold onto data the program was never meant to handle. Many individual labs cook up customized “home-brew” software programs on their own, figuring their experiments require something special. The customer base for each of those home-brew programs? About 15 to 20 people in an individual lab.

Obviously, this is a fragmented market that doesn’t lend itself to a one-size-fits-all program for the masses like Word or Excel. Yet Microsoft is well aware of the terabytes of genomic data piling up in labs, and the potential for IT to efficiently sift through this data in a way that could be useful for personalized medicine and making healthcare more efficient. About a year ago, it rolled out a program called Amalga Life Sciences which it hopes will get all the different IT programs talking to each other to help biologists start coping with their information overload.

But Microsoft has a lot more going on to serve biologists than just what’s being packaged into Amalga Life Sciences. So I was curious to hear from Simon Mercer, the director of health and well-being at Microsoft Research, a few weeks ago on a visit to his office in Redmond, WA.

Simon Mercer

I started off by telling him that I’m a biotech journalist who knows a lot more about cancer drugs than software development. Sure enough, when he said the term ALS, I immediately started thinking of amyetrophic lateral sclerosis, otherwise known as Lou Gehrig’s disease. To him, it means Amalga Life Sciences. Luckily, Mercer appears to have patience for the challenges of translating between biology jargon and computer jargon.

“If you have any questions about shrew chromosomes, I’m your man,” says Mercer. “I’m a biology refugee in a computer science company.”

Mercer, who has a doctorate in zoology, worked as director of software engineering at Ann Arbor, MI-based Gene Codes before moving to Microsoft Research in 2005. The vision he’s working on now is an open-source platform in early development called Microsoft Biology Foundation.

The idea is a pretty simple one. The average biology lab often has its own bioinformatics specialist, usually a postdoc who got interested after dabbling around in software code, Mercer says. This person has a small customer base, and their applications need to change a lot to keep up with the pace of different experiments, and capture data from a lot of instruments that go out of date fast.

This requires a lot of work, for what are essentially a hodgepodge of tiny, short-term markets. Even at a behemoth like Microsoft, “we couldn’t possibly build all the applications,” Mercer says.

So how can Microsoft get a foothold in the biology lab? The Microsoft Biology Foundation is seeking to build an open-source platform that biologists can download from the Web, and which has some common code most biologists need, Mercer says. Essentially, it’s meant to provide a template that individual researchers can build their custom applications on top of. It provides a range of algorithms for manipulating DNA, RNA, and protein sequences, and a set of connectors to publicly available resources on the Web, like the National Center for Biotechnology Information’s Basic Local Alignment Search (BLAST) tool. By providing some of this open-source code, it ought to save biologists some time and money on writing code to connect to things like BLAST, which can be better applied toward the killer experiments they want to do, Mercer says.

The Microsoft Biology Foundation “is like a Swiss Army knife of bioinformatics,” Mercer says.

The software platform is still in its beta form. Mercer was pretty pleased to note that Bellevue, WA-based Aditi Technologies had recently introduced a program for biologists called “DNA PReDUST” which was built on top of the platform. Collaborators at Cornell University, Queensland University of Technology, and the University of Texas are involved in helping build the platform, he says. His research group at Microsoft, of about 15 people, also has commercial partners that he couldn’t identify. But he wasn’t trying to make the platform out to be more than it is. There were about 1,300 downloads of the program by mid-April, after an initial round of researchers were invited to test-drive it, and it made a more public appearance at Microsoft’s annual TechFest in March.

“We’re not trying to run before we can walk here,” Mercer says.

There are plenty of other options for biologists if they want to seek out open-source platforms to take care of a lot of their bioinformatics grunt work needs. BioPerl, BioRuby, and Biopython are just a few. Not surprisingly, the Microsoft offering hopes to find an advantage by being easily hooked up to Excel, where a lot of biologists keep their data. And, naturally, Mercer and his team are thinking about ways to connect the Microsoft Biology Foundation with the proprietary program that Microsoft seeks to make money from—Amalga Life Sciences. I didn’t have time to ask about this is in great detail from the point person at Amalga Life Sciences, Jim Karkanias. But Karkanias did explain in a short note how his group should be able to work with the Microsoft Biology Foundation.

“Since MBF is positioned on the boundary between the world of predominantly academic small-scale, open-source software and the world of large-scale enterprise applications, it can act as one of our on-ramps for conducting life science research on the Amalga Life Sciences platform,” Karkanias says.

As much as Microsoft might like to one day dominate bioinformatics like Illumina does the world of sequencing instruments, it just doesn’t sound like a market that anybody can dominate—at least not yet.

“We understand it’s a heterogeneous community, and scientists should use the best software tools, not necessarily those that come from one software developer,” Mercer says. “We don’t know where the research will go next, so we don’t want to lock people in. You can’t lock people in.”