Given the results of the DNA and RNA sequencing—the geyser Darnell mentioned earlier—Watson will figure out which mutations are distinct to the tumor, what protein networks they effect, and which drugs target proteins that are part of those networks.

Gee – why did nobody ever think of doing that?!

Fact is, this is already routinely done, by squads “of highly trained geneticists, genomics experts, and clinicians”. The outcome: meagre. To put it mildly. I’m not sure what new thing Watson brings to the table. Maybe there is a real innovation here, but then the article failed to mention it. Manpower really isn’t the problem here – we know plenty of mutations which occur in cancers, as well as their effects on protein interaction networks, and even how to target these networks (in principle). But that only helps us in very limited ways.

The article alludes to the fact that Watson can do these analyses immediately while a team of scientists takes a week. Actually, they take longer. But that’s not the issue here, because neither the team of scientists nor Watson currently ends up with an actionable treatment plan. At best it will result in a candidate target for follow-up drug screenings, which takes years. So the “week” that Watson cuts down on is simply not the bottleneck.

EDIT To clarify: the article makes it sound as if Watson is trying to solve a particular problem that is already solved – and which unfortunately has so far failed to yield many advances. And while I welcome every single automation which would make my job easier, this part is simply not a bottle neck, other parts are.

I am a genetic scientist that works in a clinical and research lab that is one of the few in the country to offer cancer sequencing and aCGH testing. The sequencing and aCGH data per patient is in the gigabytes... keep in mind that these are text files.

We currently use cartagenia (http://www.cartagenia.com/), which is a tool to search curated databases for DNA, RNA, and protein and ultimately attempt to suggest how they all interact in the it etiology of the cancer. The way it works is by filtering the sequencing and aCGH data based on user defined parameters. Making sense of the gigabytes of data per patient from what is found in these databases is difficult.

Using Watson, I would hope that these databases could be searched unfiltered and RAW to try and facilitate making connections to how RNA, DNA, and proteins interact. These novel aberrations in the genome could then be used to suggest disease progression, and treatment based on such databases. We currently review all the scholarly articles in difficult cases, which requires a lot of time to read each article and essentially pick the relevant pieces of information from such publications to apply for diagnosis and eventually treatment by the physician.

Personalized medicine has been happening for years, but making useful connections within these huge amounts of data has been very difficult to do with the current technology. Hopefully Watson can improve on how we make these connections.

Using Watson, I would hope that these databases could be searched unfiltered and RAW

How do you imagine that would work? Incidentally, I work in cancer research and I follow the same workflow as you guys, albeit manually rather than using something like Cartagena (precisely because that allows more open-ended exploration).

What is the kind of information that you get from manual literature review that curated databases cannot give you? This seems to be the point where Watson would come in, but what does it provide over existing databases?

One personal example is that I am currently looking at a predicted spliced variant that is roughly 80bp, which would normally have been cut out by typical databases since the limit applied is to have a minimum size to 200bp. Using ACEview there is some EST evidence and part of my graduate research so far is to investigate this. In this project that I am working, we would ultimately like to determine if that 80bp region is necessary and sufficient for transcription.

If these databases are cutting out pieces because we THINK that they are useless, we might end up disregarding a piece of the puzzle.

I'm not a genetic scientist (but a software developer); however, don't you think that merely the sheer volume of information that can be perused by the software vs. the limited speed with which a human can access, read, assess, compute, etc. would be a prime benefit? Your post implies that the task is already completed (which is is) at what you feel is the prime speed for completion (which it cannot be at this time). It takes a fast reader (not a "speed reader") probably a few hours to finish a book of several hundred pages. A computer can peruse that same amount of content in seconds.

I also work in genetics/bioinformatics (for brain biology though, not cancer). I agree that current cancer treatments, no matter how well-targeted, are still not highly successful. But I think, from a cost-benefit perspective, teaching Watson to use genome and cancer databases might be a relatively simple co-opt of existing tech for large gain - this system could put medicine in a good position going forward to immediately make use of new advances in treatment when they become available. I think it's more useful as a tool-building venture with great potential, rather than a current "cancer solver."

I am also in the field. The main problem is that even with curated data sets, we do not actually know all protein-protein interactions, genetic interactions, different phosphorylated forms of a protein, etc etc. Also in higher organisms there are different splice variants, cells types, and miRNA that completely change how a genetic network functions. Not to mention the majority of the data we have is in laboratory conditions and less in noxious stress conditions (which cancer cells are typically in due to rapid metabolism).We do not even have all of this information yet for simpler organisms like yeast; making predictions very very hard no matter the method.

if you think they are really trying to find a cure your mistaken, its just an exersise to grow both feilds, sure watson most likely wont find anything ground breaking but its sure as hell not going to make things worse plus watsons makers are going to get a chance to mess around and learn things as well. Its all about doing the research not what you get from it.

I think /u/guepier isn't suggesting that non-cure oriented research isn't worthwhile, rather that it isn't clear at all from the article what new ideas the Watson team is going to pursue and the things the article does mention aren't new ideas that were entirely unfeasible before Watson.

The problem Watson is attacking is the toughest and most time-consuming part of dealing with DNA sequence data: combing through scientific publications to figure out what the proteins produced by genes suspected of causing cancer do. Right now, this is done by scientists, and it is both time-consuming and expensive. One recent study said the cost of analyzing a genome was $17,000. Any savings of time or cost would make the use of DNA sequencing more likely to be cost-effective. And this is in many ways a similar problem to learning to answer questions on Jeopardy.Ajay Royyuru, director for computational biology center at IBM Research, says that he hopes to bring the time it takes to do this kind of analysis down to “hours or even minutes.” More than that, he hopes that Watson will eventually allow researchers to make decisions based on more data than they could possibly integrate in their own minds — even bringing information from disparate fields.“This is a problem we face as researchers,” he says. “We are experts in what we know. But we are not experts in what we don’t know. [Watson will] systematically gather evidence, and alert the expert. If you can do that systematically you are delivering enormous evidence to the expert that will help the expert function in a faster better manner.”

Hm. The description makes no sense. Cancer researchers analysing a genome don’t often comb through publications – they query extensive, curated databases! And that, by the way, is done automated by software, not manually by a researcher (in most cases; some people do insist on combing literature by hand).

Now it might be that Watson’s job is to help in database curation. That would indeed make sense, but it’s not what I’d take away from either article, and it’s also a stepwise rather than a ground-breaking innovation: database curation is (of course) already computer-aided and done via automated text mining of publications.

A database structure can only hold information the designers of that structure anticipated holding. Unstructured text could have a lot more information in it that a reader can pick up. But, thanks for the helpful downvote.

Didn’t downvote you, I only downvote people who give wrong information.

That said, you seem to have an inaccurate idea of how these databases work. They don’t really impose any structure per se, they just give you information about (putative) connections between different entities in the body (in particular genes, their products, regulators etc.), which (known) chemical targets they have, which (known) effects they have, which studies they turned up in, and (consequently) which tumour context they were found in.

That’s pretty open-ended concerning what questions can be asked with it – I’d go as far as saying that it presents exactly the same (relevant) information as the original publication. Now, it’s of course possible that I (and every other cancer researcher on the planet) miss some connection here which Watson would be able to find. But that’s seriously grasping at straws, and I doubt that this is what the IBM folks mean.

Databases, as they are, don't deal with uncertainty and conflicting information in a coherent way. e.g. many databases have a single piece of data and just add some meta data like "electronic inference" or "experimentally derived". A computer can create a model of uncertainty based on all the sources of information and allow questions to be asked.

Text mining is also a massive area of research and you are wrong to think that information in a journal article can be fully exploited to a database. If you think you, as a cancer researcher, have a grasp on the masses of related scientific literature out there beyond your chosen focus, you are probably mistaken.

The article doesnt seem to say anything substantial that researchers are not already doing. IBM just seems to be taking an interest but I am sure they will produce some concrete results.