The DecoGen Informatics Platform contains three key components: the HAP Database, the DecoGen software suite, and an interface to SAS for performing association analysis. The DecoGen software suite comprises several applications that all use the same set of libraries and access the HAP Database.

Genaissance Pharmaceuticals Inc. realized early on that the cost of genotyping was turning pharmacogenomics into a numbers game, and so it became a leading proponent of and expert on haplotype mapping: the identification of heritable "sets" of linked SNPs.

"There is no other company that has done a better job on the scientific front," says DzGenes CEO Terrence Kungel. "They were tireless advocates of the haplotype concept."

Haplotype maps significantly reduce the number of SNPs that researchers need to survey. The company's core resource is the HAP Database, which contains its haplotypes, as well as other genetic data, including appropriate parts of GenBank. The DecoGen software platform pulls everything together.

In designing the system, the first priority was to bring the right tools to the job, says Richard Judson, senior vice president of informatics at Genaissance. "We wanted to harness the power of SAS, but provide it in a way that was accessible for all of our scientists, even those who were less statistically sophisticated," he says. "DecoGen is kind of a preprocessor, which brings together the genetic data and clinical markers and hands that to SAS." Most other genetic analysis association software, he says, uses a more limited number of statistical techniques than are available through a bona-fide statistical package like the one SAS offers.

Researchers could get the same results just by writing SAS programs, but having it set up this way "makes it easier for us to ask a lot of genetic questions very quickly," Judson says.

The informatics must also evolve as rapidly as the science. Once Genaissance launched its STRENGTH trial of approximately 700 patients, it had to decide whether to incorporate the trial data into the main database. "We have this discussion regularly, and we always end up keeping it separate," Judson says. The company has separate clinical databases for each clinical program, mostly because "if real patient identifiers are still there, you have to limit access to that database."

The other reason is that statisticians tend to have a different approach to the analyses than most biologists and do a lot of preprocessing on clinical data. "Statisticians are constantly building small, specialized data sets to do particular analyses," Judson says. "If the clinical data was in the HAP Database they would lose the flexibility to create mini-data sets."

The idea of "loose coupling" is one of the philosophies driving Judson's team. "Trying to build a single monolithic system to handle all of the data and analyses is too complex. Instead, we break the system into a small number of independent applications that talk to one another through well-defined interfaces," Judson says.

Building the system has been a learning process as well. "The first thing to realize is that trying to do everything yourself is a mistake," he says. "If there is something out there that you can use and integrate in, like SAS, that's the best thing to do."

Analyzing that kind of data is one of the toughest jobs in genomics today. "We are analyzing a number of big data sets successfully," Judson says. "It does what we want it to do."

There is room for improvement, however. "The current architecture has some performance problems, and we are re-architecturing it to be more of a Web services system," he says. DecoGen is currently a fat client that runs on the user's desktop, where it consumes a lot of resources. "If it is more server-based you could more easily have a big cluster behind the scenes," Judson says.

Customers have had the same performance issues, and next January Genaissance aims to release version 5.0, which should eliminate this problem. The first piece of that new system is RuleFinder, which currently runs on a cluster of 20 Pentium III boxes.

Genaissance also licenses the HAP Database and the DecoGen software platform. The product is usually installed in a client-server mode, with the HAP Database (which uses Oracle) residing on the server. It is accessed using the DecoGen software, which runs on a desktop client. The company is also considering licensing just the software. All of Genaissance's systems run on standard hardware, including NT, other Windows operating systems, or UNIX.

The user interface for DecoGen RuleFinder — a new tool in the DecoGen suite. RuleFinder is a high-throughput application for evaluating associations between gene haplotypes and clinical endpoints. The top panel displays a matrix of potential associations between genes (left to right) and clinical endpoints (top to bottom). The significance of an association is indicated by the color of the rectangle at the intersection of a particular gene and endpoint. Red indicates highly significant, and blue is marginally significant (cells that are not colored are not significant). P-values used for determining the color are corrected for multiple comparisons. The bottom panel shows summary information for each of the significant associations. The user can drill down to get more detailed information about each association.