Novartis’ Answer to Harry Potter

ByJohn Russell

Dec. 2006 / Jan. 2007 | Basel, Switzerland — Imagine a magic wand able to instantly materialize novel, safe, and effective drugs. No such device exists, of course, and perhaps never will, but a distant cousin of sorts is being rolled out at Novartis.

“What I’d really like to have,” says Manuel Peitsch, global head of systems biology at Novartis, “is some form of a magic stick, some Harry Potter thing, where I just click on this particular word and all the books that talk about it fly out of the book shelves, open on the right page, and are sitting there for me to go and read.”

Who wouldn’t?

In fact, Peitsch and a few sorcerer colleagues have spent the last three years conjuring up just such a magic stick. “It’s called The UltraLink (UL),” an immensely powerful text-mining tool and knowledge management platform that is rolling out to select users now and will eventually deploy throughout Novartis.

Obviously there are lots of search engines and text mining tools already available (see “Search and Deploy,” Bio-IT World, October 2006, pp: 24-33). Peitsch, UltraLink’s chief wizard and chief evangelist, believes few if any match UL’s breadth. The contents of each UL result page are read and categorized by an expert system at loading time. This enables the selection of pertinent pages based on a treelike representation of extracted concepts and entities. UltraLinks are created that associate each extracted entity with a set of meaningful links to other databases and applications.

This dry description belies UL’s ease of use and power. Key entities — genes, companies and institutions, diseases and indications, etc. — are color-coded by category and highlighted on returned pages. Gene names, for example, are yellow, institutions are red, and compounds are blue. Impressively, rather than returning a difficult-to-read document choking on a jumble of colors, results pages are surprisingly accessible and uncluttered (see figure). Drilling deeper is simple. Just click. Connecting data to other applications is also straightforward.

“If I’m in research and I want to do this [text search] and then want to use a tool to create a network around one of the genes mentioned in the text, I can do that with a couple of mouse clicks. If I’m in marketing and I want to look at what the competitive circumstances around a particular product are, I can do that, as well. It’s a ubiquitous all purpose tool,” says Peitsch, who like any parent is both proud and a perhaps little nervous about UL’s reception.

Taking a Test SpinAt a show-and-tell session held in his Basal office, Peitsch glides quickly through examples. In one instance, he burrows into public documents to dig out information on a University of Texas stem cell therapy project, uncovers a range of attributes ranging from related patents to chemistry processes, and even finds its clinical protocols. Switching gears, he rapidly digs out a wealth of information about Munc13-1 and imports data directly into GeneGo’s pathway database, Metacore, to identify and display putative Munc13-1 pathways.

The real secret under the “magic hood’ is comprised of painstakingly curated and maintained terminologies, ontologies, and rules engines. UL lives on the Novartis Knowledge Space Portal (KSP) and is invoked as a web service from desktops. Semantic Web enabling tools were not yet readily available when the project was begun, but Peitsch’s team has since begun incorporating Semantic Web technology.

The system “understands” enough biology, medicinal chemistry, and medicine to contextually distinguish between many similar terms. Consider for example, the abbreviation MS. It could stand for multiple sclerosis, Microsoft, or Mississippi. UL categorizes the document first, which then helps to correctly identify terms contained in it more consistently.

“The reason we have [UL] as part of systems biology is that systems biology needs a comprehensive list of the parts of the system, as well as their interactions,” he says. “A lot of that information is available in the text. So, anything that touches really advanced text computing is for us an integral part of the systems biology concept.”

Prior to leading the new systems biology department, Peitsch was CIO for Novartis Research including IT infrastructure and informatics. He helped lead the charge to go paperless in the library area, which has been largely accomplished over the past five years. “We closed a number of physical libraries because they’re less and less visited. Basically the desktop is your library,” he says. This move from paper to electronic media also enabled Peitsch to create synergies between Text Mining and the Library, and leverage those technologies to analyze publications, patents and other information sources.

UltraLink is perhaps the most advanced piece of Novartis’ multifaceted systems biology initiative. Peitsch leads the formal systems biology group which is part of the Novartis research organization. Under its umbrella are text mining (lead by Thérèse Vachon), computational systems biology (lead by Carolyn Cho), and proteomics (lead by Jan van Oostrum). Don Stanski leads a modeling group and simulation group that is part of the Novartis development organization. Both groups collaborate with various disease research teams.

“Proteomics is a key component of my department” says Peitsch, who is part of the genome and proteome sciences platform lead by Mark Boguski. “We have a whole gamut of proteomics technologies at our [disposal], but our most recent addition is a reverse protein arrays platform.”

Last summer, Peitsch and several colleagues wrote a review, “The application of systems biology to drug discovery” in Current Opinion in Chemical Biology. The review not only spelled out Novartis’ perception of system biology, but also presented recent work with reverse protein arrays to capture time course data so necessary to characterize signaling pathways.

Here’s an excerpt from the paper: “Recent years have witnessed the development of genome-scale functional screens, large collections of reagents, protein microarrays, databases and algorithms for data and text mining. Taken together, they enable unprecedented descriptions of complex biological systems, which are testable by mathematical modeling and simulation...[I]t is their iterative and combinatorial application that defines the systems biology approach.”

Collaborative DatabasesData modeling is a fairly new activity in Peitsch’s group. “Taking all these facts, components of networks, how they interact, represent them as process maps and run simulations is where we want to go. We’ve been building up the group. The last member of the group joined in July and we are gearing up to apply these methods to our discovery projects” he says.

“What we are aiming for in many ways is a new kind of “Signalome” database, which integrates the nice picture of the pathways, the process maps, the data, the mathematical models, and [offers] more than one model to represent that knowledge,” says Peitsch.

The first objective, he says, is to run projects which successfully integrate computational and experimental approaches. His group will collaborate with disease areas; work is ongoing with oncology and autoimmunity researchers. This should help overcome one persistent problem; data sets often aren’t generated from a modeling perspective which makes building appropriate models more difficult. Peitsch wants to reach the point where “mathematical modelers actually ignite the imagination of the experimentalist.”

“We are, of course, looking for targets and biomarkers and are analyzing compounds through systems response profiling using reverse arrays,” he says. “The latter can lead to the identification of off target effects of compounds, and to an understanding of the compound mode of action in a pathway/network context.”

Systems biology has long suffered a lingering reputation as misguided alchemy. Peitsch understands this: “The major challenge is, as with any new approach, to prove the value and demonstrate clear benefits. Mathematical modeling is not broadly accepted yet, and many challenges are before us to show that these types of approaches can shed new light on biology.”

Prompting company-wide adoption of UL will be a good first step, and Peitsch is devoting much of his time to making that happen. Currently, UL has about 300 users, and gets 4,000-to-5,000 requests per month.

Like many researchers, Peitsch’s computer training took place mostly outside the classroom. As an undergrad, he flirted with astronomy and physics before double majoring in biochemistry and physical chemistry. He then earned a Ph.D. in biochemistry. Along the way he wrote application programs as needed, and recalls his father giving him his first programming assignment.

“I was 13, long before I started biology studies. My father is an engineer. He invented a lot of machines. So, basically one day he told me this is a computer, this is a book; they go together. I need this program. That’s how I started. So, early on I wanted to bring computing and biology together.”

He seems to have successfully made the transition from sorcerer’s apprentice to full-fledged wizard. Look out Lord Voldemort!