Sign up to receive free email alerts when patent applications with chosen keywords are publishedSIGN UP

Abstract:

A computational systems pharmacology framework consisting of statistical
modeling and machine learning based on comprehensive integration of
systems biology data, including drug target data, protein-protein
interaction (PPI) networks, and gene ontology (GO) annotations, and
reported drug side effects, can predict drug toxicity or drug adverse
reactions (ADRs). Biomolecular network and gene annotation information
can significantly improve the predictive accuracy of ADR of drugs under
development. The use of PPI networks can increase prediction specificity,
and the use of GO annotations can increase prediction sensitivity.

Claims:

1. A toxicity analysis tool comprising: a patient analysis module
configured to obtain gene expression information about a particular
patient; a database module configured to provide a set of targets for
known interactions of a particular drug; a network interaction module
configured to expand said set of targets based on network interaction
information to produce an expanded set of targets; and a toxicity module
configured to determine if a toxicity reaction is likely based on said
expanded set of targets, said toxicity module outputting an evaluation of
the likelihood of toxicity for the particular drug with the particular
patient.

2. The toxicity analysis tool of claim 1 wherein said patient analysis
module is also configured to obtain at least one of RNA, DNA, protein,
and metabolite information.

3. The toxicity analysis tool of claim 1 wherein said database module
includes at least one of drug and drug target information and drug side
effect information.

6. The toxicity analysis tool of claim 1 wherein said toxicity module
includes a prediction model is configured to execute at least one of
support vector machine software and logistical regression analysis
software.

7. The toxicity analysis tool of claim 1 wherein said extended set of
targets includes feature information associated with each target, and
said tool further including a feature selection module configured to
remove elements of said extended set of targets based on said feature
information.

8. The toxicity analysis tool of claim 7 wherein said feature selection
module is configured to filter said extended set of targets based on
associated feature information having a p-value under a predetermined
value.

9. The toxicity analysis tool of claim 8 wherein said predetermined value
is about 0.05.

10. The toxicity analysis tool of claim 1 further including a
cross-validation module configured to balance said extended set of
targets.

11. The toxicity analysis tool of claim 10 wherein said cross-validation
module partitions said extended set of targets into a plurality of
training sets and a testing set, and said cross-validation module
balances said plurality of training sets.

12. A method of determining toxicity including the steps of: obtaining
gene expression information about a particular patient; accessing at
least one database and extracting a set of targets for known interactions
of a particular drug; expanding the set of targets based on network
interaction information to produce an expanded set of targets; and
determining if a toxicity reaction is likely based on said expanded set
of targets, said determining step including outputting an evaluation of
the likelihood of toxicity for the particular drug.

13. The toxicity determination method of claim 12 further including a
step of obtaining at least one of gene expression information and
metabolite information of a particular patient, and said determining step
further evaluates toxicity based on the particular patient.

14. The toxicity determination method of claim 12 wherein said accessing
step includes accessing at least one of drug and drug target information
and drug side effect information.

17. The toxicity determination method of claim 12 wherein said
determining step includes executing at least one of support vector
machine software and logistical regression analysis software.

18. The toxicity determination method of claim 12 wherein the extended
set of targets includes feature information associated with each target,
and said method further includes removing elements of the extended set of
targets based on feature information.

19. The toxicity determination method of claim 18 wherein said removing
step includes filtering the extended set of targets based on associated
feature information having a p-value under a predetermined value.

20. The toxicity determination method of claim 19 wherein the
predetermined value is about 0.05.

21. The toxicity determination method of claim 1 further including the
step of cross-validation by balancing the extended set of targets.

22. The toxicity determination method of claim 10 wherein said
cross-validation step includes partitioning the extended set of targets
into a plurality of training sets and a testing set, and said
cross-validation step includes balancing said plurality of training sets.

[0005] Over 1500 Mendelian conditions whose molecular cause is unknown are
listed in the Online Mendelian Inheritance in Man (OMIM) database.
Additionally, almost all medical conditions are in some way influenced by
human genetic variation. The identification of genes associated with
these conditions is a goal of numerous research groups, in order to both
improve medical care and better understand gene functions, interactions,
and pathways. Sequencing large numbers of candidate genes remains a
time-consuming and expensive task, and it is often not possible to
identify the correct disease gene by inspection of the list of genes
within the interval.

[0006] A number of computational approaches toward candidate-gene
prioritization have been developed that are based on functional
annotation, gene-expression data, or sequence-based features.
High-throughput technologies have produced vast amounts of
protein-protein interaction data, which represent a valuable resource for
candidate-gene prioritization, because genes related to a specific or
similar disease phenotype tend to be located in a specific neighborhood
in the protein-protein interaction network. However, only relatively
simple methods for exploring biological networks have been applied to the
problem of candidate-gene prioritization, such as the search for direct
neighbors of other disease genes and the calculation of the shortest path
between candidates and known disease proteins.

[0008] Recent research on drug side effects has drawn attention to the
inadequacy of the traditional "one drug, one target, and causal effect"
model. Modern drugs are designed to regulate the functions of specific
target proteins, or "drug targets". Efficacious drugs can break through
human barriers of absorption, discretion, metabolism, and excretion to
achieve desirable "on-target" effects. However, drugs may also bind to
"off-target" proteins, potentially leading to unwanted side effects,
which range from mild drowsiness to deadly cardiotoxicity. More
appropriate models must be developed to take advantage of complex
molecular responses of drugs in cells, by exploiting fully the
relationships between chemical compounds, protein targets, and side
effects observed at the physiological level.

[0009] Systematic and quantitative investigation of adverse side effects
has become increasingly important due to rising concerns about the
cytotoxicity of drugs in development. Studies of drug toxicity and
unintended side effects can lead to improved drug safety and efficacy.
One promising strategy comes from molecular systems biology in the form
of "systems pharmacology". Although the importance between systems
biology and drug toxicity had been recognized, there had been no
published report about how to practically predict drug toxicity by using
biomolecular interaction and/or annotation information.

[0011] In one embodiment, the present invention relates to a toxicity
analysis tool comprising a patient analysis, database, network
interaction, and toxicity models. The patient analysis module is
configured to obtain gene expression information about a particular
patient. The database module is configured to provide a set of targets
for known interactions of a particular drug. The network interaction
module is configured to expand said set of targets based on network
interaction information to produce an expanded set of targets. The
toxicity module is configured to determine if a toxicity reaction is
likely based on said expanded set of targets, said toxicity module
outputting an evaluation of the likelihood of toxicity for the particular
drug with the particular patient. The patient analysis module is also
configured to obtain metabolite information. The database module includes
at least one of drug and drug target information and drug side effect
information. The network interaction module uses a protein-protein
interaction network model, and also uses gene ontology information
including hierarchical terms, biological processes, cellular components,
and molecular functions. The toxicity module includes a prediction model
configured to execute at least one of support vector machine software and
logistical regression analysis software. The extended set of targets
includes feature information associated with each target, and the tool
further includes a feature selection module configured to remove elements
of the extended set of targets based on said feature information. The
feature selection module is configured to filter said extended set of
targets based on associated feature information having a p-value under a
predetermined value, for example about 0.05. The tool further includes a
cross-validation module configured to balance the extended set of
targets, for example by partitioning the extended set of targets into a
plurality of training sets and a testing set, and then balancing the
plurality of training sets.

[0012] In another embodiment, the present invention relates to a method of
determining toxicity. First is the step of obtaining gene expression
information about a particular patient. Then, at least one database is
accessed and a set of targets for known interactions of a particular drug
are extracted. The set of targets is expanded based on network
interaction information to produce an expanded set of targets. A toxicity
reaction is determined to be likely based on the expanded set of targets,
and an evaluation of the likelihood of toxicity for the particular drug
is output. A further step of obtaining at least one of gene expression
information and metabolite information of a particular patient may be
performed, to evaluate toxicity based on the particular patient. The
accessing step includes accessing at least one of drug and drug target
information and drug side effect information. The expanding step uses a
protein-protein interaction network model, and uses gene ontology
information including hierarchical terms, biological processes, cellular
components, and molecular functions. The determining step includes
executing at least one of support vector machine software and logistical
regression analysis software. The extended set of targets includes
feature information associated with each target, and the method further
includes removing elements of the extended set of targets based on
feature information. The removing step includes filtering the extended
set of targets based on associated feature information having a p-value
under a predetermined value, for example about 0.05. The method further
includes the step of cross-validation by balancing the extended set of
targets, for example by partitioning the extended set of targets into a
plurality of training sets and a testing set, and then balancing the
plurality of training sets.

BRIEF DESCRIPTION OF THE DRAWINGS

[0013] The above mentioned and other features and objects of this
invention, and the manner of attaining them, will become more apparent
and the invention itself will be better understood by reference to the
following description of an embodiment of the invention taken in
conjunction with the accompanying drawings, wherein:

[0014] FIG. 1 is a schematic diagrammatic view of a network system in
which embodiments of the present invention may be utilized.

[0015] FIG. 2 is a block diagram of a computing system (either a server or
client, or both, as appropriate), with optional input devices (e.g.,
keyboard, mouse, touch screen, etc.) and output devices, hardware,
network connections, one or more processors, and memory/storage for data
and modules, etc. which may be utilized in conjunction with embodiments
of the present invention.

[0017] FIG. 4A is a chart, FIG. 4B is a network diagram, and FIG. 4C is a
flow diagram all illustrating drug target vs. drug side effect and an
example of drug target-expanding network.

[0018] FIGS. 5A and 5B are graph diagrams illustrating the classification
performance comparison for statistical modeling and machine learning by
using different PPI confidence levels.

[0019] FIGS. 6A and 6B are graph diagrams illustrating the classification
performance comparison for statistical modeling and machine learning by
using different GO annotation levels.

[0020] FIG. 7 is a network diagram illustrating the
cardiotoxicity-associated PPI network built by using drug
target-expanding PPI network modeling.

[0021] Corresponding reference characters indicate corresponding parts
throughout the several views. Although the drawings represent embodiments
of the present invention, the drawings are not necessarily to scale and
certain features may be exaggerated in order to better illustrate and
explain the present invention. The flow charts and screen shots are also
representative in nature, and actual embodiments of the invention may
include further features or steps not shown in the drawings. The
exemplification set out herein illustrates an embodiment of the
invention, in one form, and such exemplifications are not to be construed
as limiting the scope of the invention in any manner.

DESCRIPTION OF EMBODIMENTS OF THE PRESENT INVENTION

[0022] The embodiment disclosed below is not intended to be exhaustive or
limit the invention to the precise form disclosed in the following
detailed description. Rather, the embodiment is chosen and described so
that others skilled in the art may utilize its teachings.

[0023] In the field of molecular biology, gene expression profiling is the
measurement of the activity (the expression) of thousands of genes at
once, to create a global picture of cellular function including protein
and other cellular building blocks. These profiles may, for example,
distinguish between cells that are actively dividing or otherwise
reacting to the current bodily condition, or show how the cells react to
a particular treatment such as positive drug reactions or toxicity
reactions. Many experiments of this sort measure an entire genome
simultaneously, that is, every gene present in a particular cell, as well
as other important cellular building blocks.

[0024] DNA Microarray technology measures the relative activity of
previously identified target genes. Sequence based techniques, like
serial analysis of gene expression (SAGE, SuperSAGE) are also used for
gene expression profiling. SuperSAGE is especially accurate and may
measure any active gene, not just a predefined set. The advent of
next-generation sequencing has made sequence based expression analysis an
increasingly popular, "digital" alternative to microarrays called
RNA-Seq.

[0025] Expression profiling provides a view to what a patient's genetic
materials are actually doing at a point in time. Genes contain the
instructions for making messenger RNA (mRNA), but at any moment each cell
makes mRNA from only a fraction of the genes it carries. If a gene is
used to produce mRNA, it is considered "on", otherwise "off". Many
factors determine whether a gene is on or off, such as the time of day,
whether or not the cell is actively dividing, its local environment, and
chemical signals from other cells. For instance, skin cells, liver cells
and nerve cells turn on (express) somewhat different genes and that is in
large part what makes them different. Therefore, an expression profile
allows one to deduce a cell's type, state, environment, and so forth.

[0026] Expression profiling experiments often involve measuring the
relative amount of mRNA expressed in two or more experimental conditions.
For example, genetic databases have been created that reflect a normative
state of a healthy patient, which may be contrasted with databases that
have been created from a set of patient's with a particular disease or
other condition. This contrast is relevant because altered levels of a
specific sequence of mRNA suggest a changed need for the protein coded
for by the mRNA, perhaps indicating a homeostatic response or a
pathological condition. For example, higher levels of mRNA coding for one
particular disease is indicative that the cells or tissues under study
are responding to the effects of the particular disease. Similarly, if
certain cells, for example a type of cancer cells, express higher levels
of mRNA associated with a particular transmembrane receptor than normal
cells do, the expression of that receptor is indicative of cancer. A drug
that interferes with this receptor may prevent or treat that type of
cancer. In developing a drug, gene expression profiling may assess a
particular drug's toxicity, for example by detecting changing levels in
the expression of certain genes that constitute a biomarker of drug
metabolism.

[0027] For a type of cell, the group of genes and other cellular materials
whose combined expression pattern is uniquely characteristic to a given
condition or disease constitutes the gene signature of this condition or
disease. Ideally, the gene signature is used to detect a specific state
of a condition or disease to facilitates selection of treatments. Gene
Set Enrichment Analysis (GSEA) and similar methods take advantage of this
kind of logic and uses more sophisticated statistics. Component genes in
real processes display more complex behavior than simply expressing as a
group, and the amount and variety of gene expression is meaningful. In
any case, these statistics measure how different the behavior of some
small set of genes is compared to genes not in that small set.

[0028] One way to analyze sets of genes and other cellular materials
apparent in gene expression measurement is through the use of pathway
models and network models. Many protein-protein interactions (PPIs) in a
cell form protein interaction networks (PINs) where proteins are nodes
and their interactions are edges. There are dozens of PPI detection
methods to identify such interactions. In addition, gene regulatory
networks (DNA-protein interaction networks) model the activity of genes
which is regulated by transcription factors, proteins that typically bind
to DNA. Most transcription factors bind to multiple binding sites in a
genome. As a result, all cells have complex gene regulatory networks
which may be combined with PPIs to link together these various
connections. The chemical compounds of a living cell are connected by
biochemical reactions which convert one compound into another. The
reactions are catalyzed by enzymes. Thus, all compounds in a cell are
parts of an intricate biochemical network of reactions which is called
the metabolic network, which may further enhance PPI and/or DNA-protein
network models. Further, signals are transduced within cells or in
between cells and thus form complex signaling networks that may further
augment such genetic interaction networks. For instance, in the MAPK/ERK
pathway is transduced from the cell surface to the cell nucleus by a
series of protein-protein interactions, phosphorylation reactions, and
other events. Signaling networks typically integrate protein-protein
interaction networks, gene regulatory networks, and metabolic networks.

[0029] The detailed descriptions which follow are presented in part in
terms of algorithms and symbolic representations of operations on data
bits within a computer memory representing genetic profiling information
derived from patient sample data and populated into network models. A
computer generally includes a processor for executing instructions and
memory for storing instructions and data. When a general purpose computer
has a series of machine encoded instructions stored in its memory, the
computer operating on such encoded instructions may become a specific
type of machine, namely a computer particularly configured to perform the
operations embodied by the series of instructions. Some of the
instructions may be adapted to produce signals that control operation of
other machines and thus may operate through those control signals to
transform materials far removed from the computer itself. These
descriptions and representations are the means used by those skilled in
the art of data processing arts to most effectively convey the substance
of their work to others skilled in the art.

[0030] An algorithm is here, and generally, conceived to be a
self-consistent sequence of steps leading to a desired result. These
steps are those requiring physical manipulations of physical quantities.
Usually, though not necessarily, these quantities take the form of
electrical or magnetic pulses or signals capable of being stored,
transferred, transformed, combined, compared, and otherwise manipulated.
It proves convenient at times, principally for reasons of common usage,
to refer to these signals as bits, values, symbols, characters, display
data, terms, numbers, or the like as a reference to the physical items or
manifestations in which such signals are embodied or expressed. It should
be borne in mind, however, that all of these and similar terms are to be
associated with the appropriate physical quantities and are merely used
here as convenient labels applied to these quantities.

[0031] Some algorithms may use data structures for both inputting
information and producing the desired result. Data structures greatly
facilitate data management by data processing systems, and are not
accessible except through sophisticated software systems. Data structures
are not the information content of a memory, rather they represent
specific electronic structural elements which impart or manifest a
physical organization on the information stored in memory. More than mere
abstraction, the data structures are specific electrical or magnetic
structural elements in memory which simultaneously represent complex data
accurately, often data modeling physical characteristics of related
items, and provide increased efficiency in computer operation.

[0032] Further, the manipulations performed are often referred to in
terms, such as comparing or adding, commonly associated with mental
operations performed by a human operator. No such capability of a human
operator is necessary, or desirable in most cases, in any of the
operations described herein which form part of the present invention; the
operations are machine operations. Useful machines for performing the
operations of the present invention include general purpose digital
computers or other similar devices. In all cases the distinction between
the method operations in operating a computer and the method of
computation itself should be recognized. The present invention relates to
a method and apparatus for operating a computer in processing electrical
or other (e.g., mechanical, chemical) physical signals to generate other
desired physical manifestations or signals. The computer operates on
software modules, which are collections of signals stored on a media that
represents a series of machine instructions that enable the computer
processor to perform the machine instructions that implement the
algorithmic steps. Such machine instructions may be the actual computer
code the processor interprets to implement the instructions, or
alternatively may be a higher level coding of the instructions that is
interpreted to obtain the actual computer code. The software module may
also include a hardware component, wherein some aspects of the algorithm
are performed by the circuitry itself rather as a result of an
instruction.

[0033] The present invention also relates to an apparatus for performing
these operations. This apparatus may be specifically constructed for the
required purposes or it may comprise a general purpose computer as
selectively activated or reconfigured by a computer program stored in the
computer. The algorithms presented herein are not inherently related to
any particular computer or other apparatus unless explicitly indicated as
requiring particular hardware. In some cases, the computer programs may
communicate or relate to other programs or equipments through signals
configured to particular protocols which may or may not require specific
hardware or programming to interact. In particular, various general
purpose machines may be used with programs written in accordance with the
teachings herein, or it may prove more convenient to construct more
specialized apparatus to perform the required method steps. The required
structure for a variety of these machines will appear from the
description below.

[0034] The present invention may deal with "object-oriented" software, and
particularly with an "object-oriented" operating system. The
"object-oriented" software is organized into "objects", each comprising a
block of computer instructions describing various procedures ("methods")
to be performed in response to "messages" sent to the object or "events"
which occur with the object. Such operations include, for example, the
manipulation of variables, the activation of an object by an external
event, and the transmission of one or more messages to other objects.

[0035] Messages are sent and received between objects having certain
functions and knowledge to carry out processes. Messages are generated in
response to user instructions, for example, by a user activating an icon
with a "mouse" pointer generating an event. Also, messages may be
generated by an object in response to the receipt of a message. When one
of the objects receives a message, the object carries out an operation (a
message procedure) corresponding to the message and, if necessary,
returns a result of the operation. Each object has a region where
internal states (instance variables) of the object itself are stored and
where the other objects are not allowed to access. One feature of the
object-oriented system is inheritance. For example, an object for drawing
a "circle" on a display may inherit functions and knowledge from another
object for drawing a "shape" on a display.

[0036] A programmer "programs" in an object-oriented programming language
by writing individual blocks of code each of which creates an object by
defining its methods. A collection of such objects adapted to communicate
with one another by means of messages comprises an object-oriented
program. Object-oriented computer programming facilitates the modeling of
interactive systems in that each component of the system can be modeled
with an object, the behavior of each component being simulated by the
methods of its corresponding object, and the interactions between
components being simulated by messages transmitted between objects.

[0037] An operator may stimulate a collection of interrelated objects
comprising an object-oriented program by sending a message to one of the
objects. The receipt of the message may cause the object to respond by
carrying out predetermined functions which may include sending additional
messages to one or more other objects. The other objects may in turn
carry out additional functions in response to the messages they receive,
including sending still more messages. In this manner, sequences of
message and response may continue indefinitely or may come to an end when
all messages have been responded to and no new messages are being sent.
When modeling systems utilizing an object-oriented language, a programmer
need only think in terms of how each component of a modeled system
responds to a stimulus and not in terms of the sequence of operations to
be performed in response to some stimulus. Such sequence of operations
naturally flows out of the interactions between the objects in response
to the stimulus and need not be preordained by the programmer.

[0038] Although object-oriented programming makes simulation of systems of
interrelated components more intuitive, the operation of an
object-oriented program is often difficult to understand because the
sequence of operations carried out by an object-oriented program is
usually not immediately apparent from a software listing as in the case
for sequentially organized programs. Nor is it easy to determine how an
object-oriented program works through observation of the readily apparent
manifestations of its operation. Most of the operations carried out by a
computer in response to a program are "invisible" to an observer since
only a relatively few steps in a program typically produce an observable
computer output.

[0039] In the following description, several terms which are used
frequently have specialized meanings in the present context. The term
"object" relates to a set of computer instructions and associated data
which can be activated directly or indirectly by the user. The terms
"windowing environment", "running in windows", and "object oriented
operating system" are used to denote a computer user interface in which
information is manipulated and displayed on a video display such as
within bounded regions on a raster scanned video display. The terms
"network", "local area network", "LAN", "wide area network", or "WAN"
mean two or more computers which are connected in such a manner that
messages may be transmitted between the computers. In such computer
networks, typically one or more computers operate as a "server", a
computer with large storage devices such as hard disk drives and
communication hardware to operate peripheral devices such as printers or
modems. Other computers, termed "workstations", provide a user interface
so that users of computer networks can access the network resources, such
as shared data files, common peripheral devices, and inter-workstation
communication. Users activate computer programs or network resources to
create "processes" which include both the general operation of the
computer program along with specific operating characteristics determined
by input variables and its environment. Similar to a process is an agent
(sometimes called an intelligent agent), which is a process that gathers
information or performs some other service without user intervention and
on some regular schedule. Typically, an agent, using parameters typically
provided by the user, searches locations either on the host machine or at
some other point on a network, gathers the information relevant to the
purpose of the agent, and presents it to the user on a periodic basis. A
"module" refers to a portion of a computer system and/or software program
that carries out one or more specific functions and may be used alone or
combined with other modules of the same system or program.

[0040] The term "desktop" means a specific user interface which presents a
menu or display of objects with associated settings for the user
associated with the desktop. When the desktop accesses a network
resource, which typically requires an application program to execute on
the remote server, the desktop calls an Application Program Interface, or
"API", to allow the user to provide commands to the network resource and
observe any output. The term "Browser" refers to a program which is not
necessarily apparent to the user, but which is responsible for
transmitting messages between the desktop and the network server and for
displaying and interacting with the network user. Browsers are designed
to utilize a communications protocol for transmission of text and graphic
information over a world wide network of computers, namely the "World
Wide Web" or simply the "Web". Examples of Browsers compatible with the
present invention include the Internet Explorer program sold by Microsoft
Corporation (Internet Explorer is a trademark of Microsoft Corporation),
the Opera Browser program created by Opera Software ASA, or the Firefox
browser program distributed by the Mozilla Foundation (Firefox is a
registered trademark of the Mozilla Foundation). Although the following
description details such operations in terms of a graphic user interface
of a Browser, the present invention may be practiced with text based
interfaces, or even with voice or visually activated interfaces, that
have many of the functions of a graphic based Browser.

[0041] Browsers display information which is formatted in a Standard
Generalized Markup Language ("SGML") or a HyperText Markup Language
("HTML"), both being scripting languages which embed non-visual codes in
a text document through the use of special ASCII text codes. Files in
these formats may be easily transmitted across computer networks,
including global information networks like the Internet, and allow the
Browsers to display text, images, and play audio and video recordings.
The Web utilizes these data file formats to conjunction with its
communication protocol to transmit such information between servers and
workstations. Browsers may also be programmed to display information
provided in an eXtensible Markup Language ("XML") file, with XML files
being capable of use with several Document Type Definitions ("DTD") and
thus more general in nature than SGML or HTML. The XML file may be
analogized to an object, as the data and the stylesheet formatting are
separately contained (formatting may be thought of as methods of
displaying information, thus an XML file has data and an associated
method).

[0042] The terms "personal digital assistant" or "PDA", as defined above,
means any handheld, mobile device that combines computing, telephone,
fax, e-mail and networking features. The terms "wireless wide area
network" or "WWAN" mean a wireless network that serves as the medium for
the transmission of data between a handheld device and a computer. The
term "synchronization" means the exchanging of information between a
first device, e.g. a handheld device, and a second device, e.g. a desktop
computer, either via wires or wirelessly. Synchronization ensures that
the data on both devices are identical (at least at the time of
synchronization).

[0043] In wireless wide area networks, communication primarily occurs
through the transmission of radio signals over analog, digital cellular
or personal communications service ("PCS") networks. Signals may also be
transmitted through microwaves and other electromagnetic waves. At the
present time, most wireless data communication takes place across
cellular systems using second generation technology such as code-division
multiple access ("CDMA"), time division multiple access ("TDMA"), the
Global System for Mobile Communications ("GSM"), Third Generation
(wideband or "3G"), Fourth Generation (broadband or "4G"), personal
digital cellular ("PDC"), or through packet-data technology over analog
systems such as cellular digital packet data (CDPD") used on the Advance
Mobile Phone Service ("AMPS").

[0044] The terms "wireless application protocol" or "WAP" mean a universal
specification to facilitate the delivery and presentation of web-based
data on handheld and mobile devices with small user interfaces. "Mobile
Software" refers to the software operating system which allows for
application programs to be implemented on a mobile device such as a
mobile telephone or PDA. Examples of Mobile Software are Java and Java ME
(Java and JavaME are trademarks of Sun Microsystems, Inc. of Santa Clara,
Calif.), BREW (BREW is a registered trademark of Qualcomm Incorporated of
San Diego, Calif.), Windows Mobile (Windows is a registered trademark of
Microsoft Corporation of Redmond, Wash.), Palm OS (Palm is a registered
trademark of Palm, Inc. of Sunnyvale, Calif.), Symbian OS (Symbian is a
registered trademark of Symbian Software Limited Corporation of London,
United Kingdom), ANDROID OS (ANDROID is a registered trademark of Google,
Inc. of Mountain View, Calif.), and iPhone OS (iPhone is a registered
trademark of Apple, Inc. of Cupertino, Calif.), and Windows Phone 7.
"Mobile Apps" refers to software programs written for execution with
Mobile Software.

[0045] "PACS" refers to Picture Archiving and Communication System (PACS)
involving medical imaging technology for storage of, and convenient
access to, images from multiple source machine types. Electronic images
and reports are transmitted digitally via PACS; this eliminates the need
to manually file, retrieve, or transport film jackets. The universal
format for PACS image storage and transfer is DICOM (Digital Imaging and
Communications in Medicine). Non-image data, such as scanned documents,
may be incorporated using consumer industry standard formats like PDF
(Portable Document Format), once encapsulated in DICOM. A PACS typically
consists of four major components: imaging modalities such as X-ray
computed tomography (CT) and magnetic resonance imaging (MRI) (although
other modalities such as ultrasound (US), positron emission tomography
(PET), endoscopy (ES), mammograms (MG), Digital radiography (DR),
computed radiography (CR), etc. may be included), a secured network for
the transmission of patient information, workstations and mobile devices
for interpreting and reviewing images, and archives for the storage and
retrieval of images and reports. When used in a more generic sense, PACS
may refer to any image storage and retrieval system.

[0046] FIG. 1 is a high-level block diagram of a computing environment 100
according to one embodiment. FIG. 1 illustrates server 110 and three
clients 112 connected by network 114. Only three clients 112 are shown in
FIG. 1 in order to simplify and clarify the description. Embodiments of
the computing environment 100 may have thousands or millions of clients
112 connected to network 114, for example the Internet. Users (not shown)
may operate software 116 on one of clients 112 to both send and receive
messages network 114 via server 110 and its associated communications
equipment and software (not shown).

[0048] Bus 212 allows data communication between central processor 214 and
system memory 217, which may include read-only memory (ROM) or flash
memory (neither shown), and random access memory (RAM) (not shown), as
previously noted. RAM is generally the main memory into which operating
system and application programs are loaded. ROM or flash memory may
contain, among other software code, Basic Input-Output system (BIOS)
which controls basic hardware operation such as interaction with
peripheral components. Applications resident with computer system 210 are
generally stored on and accessed via computer readable media, such as
hard disk drives (e.g., fixed disk 244), optical drives (e.g., optical
drive 240), floppy disk unit 237, or other storage medium. Additionally,
applications may be in the form of electronic signals modulated in
accordance with the application and data communication technology when
accessed via network modem 247 or interface 248 or other
telecommunications equipment (not shown).

[0049] Storage interface 234, as with other storage interfaces of computer
system 210, may connect to standard computer readable media for storage
and/or retrieval of information, such as fixed disk drive 244. Fixed disk
drive 244 may be part of computer system 210 or may be separate and
accessed through other interface systems. Modem 247 may provide direct
connection to remote servers via telephone link or the Internet via an
internet service provider (ISP) (not shown). Network interface 248 may
provide direct connection to remote servers via direct network link to
the Internet via a POP (point of presence). Network interface 248 may
provide such connection using wireless techniques, including digital
cellular telephone connection, Cellular Digital Packet Data (CDPD)
connection, digital satellite data connection or the like.

[0050] Many other devices or subsystems (not shown) may be connected in a
similar manner (e.g., document scanners, digital cameras and so on).
Conversely, all of the devices shown in FIG. 2 need not be present to
practice the present disclosure. Devices and subsystems may be
interconnected in different ways from that shown in FIG. 2. Operation of
a computer system such as that shown in FIG. 2 is readily known in the
art and is not discussed in detail in this application. Software source
and/or object codes to implement the present disclosure may be stored in
computer-readable storage media such as one or more of system memory 217,
fixed disk 244, optical disk 242, or floppy disk 238. The operating
system provided on computer system 210 may be a variety or version of
either MS-DOS® (MS-DOS is a registered trademark of Microsoft
Corporation of Redmond, Wash.), WINDOWS® (WINDOWS is a registered
trademark of Microsoft Corporation of Redmond, Wash.), OS/2® (OS/2 is
a registered trademark of International Business Machines Corporation of
Armonk, N.Y.), UNIX® (UNIX is a registered trademark of X/Open
Company Limited of Reading, United Kingdom), Linux® (Linux is a
registered trademark of Linus Torvalds of Portland, Oreg.), or other
known or developed operating system. In some embodiments, computer system
210 may take the form of a tablet computer, typically in the form of a
large display screen operated by touching the screen. In tablet computer
alternative embodiments, the operating system may be iOS® (iOS is a
registered trademark of Cisco Systems, Inc. of San Jose, Calif., used
under license by Apple Corporation of Cupertino, Calif.), Android®
(Android is a trademark of Google Inc. of Mountain View, Calif.),
Blackberry® Tablet OS (Blackberry is a registered trademark of
Research In Motion of Waterloo, Ontario, Canada), webOS (webOS is a
trademark of Hewlett-Packard Development Company, L.P. of Texas), and/or
other suitable tablet operating systems.

[0051] Moreover, regarding the signals described herein, those skilled in
the art recognize that a signal may be directly transmitted from a first
block to a second block, or a signal may be modified (e.g., amplified,
attenuated, delayed, latched, buffered, inverted, filtered, or otherwise
modified) between blocks. Although the signals of the above described
embodiments are characterized as transmitted from one block to the next,
other embodiments of the present disclosure may include modified signals
in place of such directly transmitted signals as long as the
informational and/or functional aspect of the signal is transmitted
between blocks. To some extent, a signal input at a second block may be
conceptualized as a second signal derived from a first signal output from
a first block due to physical limitations of the circuitry involved
(e.g., there will inevitably be some attenuation and delay). Therefore,
as used herein, a second signal derived from a first signal includes the
first signal or any modifications to the first signal, whether due to
circuit limitations or due to passage through other circuit elements
which do not change the informational and/or final functional aspect of
the first signal.

[0052] One peripheral device particularly useful with embodiments of the
present invention is microarray 250. Generally, microarray 250 represents
one or more devices capable of analyzing and providing genetic expression
and other molecular information from patients. Microarrays may be
manufactured in different ways, depending on the number of probes under
examination, costs, customization requirements, and the type of analysis
contemplated. Such arrays may have as few as 10 probes or over a million
micrometre-scale probes, and are generally available from multiple
commercial vendors. Each probe in a particular array is responsive to one
or more genes, gene-expressions, proteins, enzymes, metabolites and/or
other molecular materials, collectively referred to hereinafter as
targets or target products.

[0053] In some embodiments, gene expression values from microarray
experiments may be represented as heat maps to visualize the result of
data analysis. In other embodiments, the gene expression values are
mapped into a network structure and compared to other network structures,
e.g. normalized samples and/or samples of patients with a particular
condition or disease. In either circumstance, a simple patient sample may
be analyzed and compared multiple times to focus or differentiate
diagnoses or treatments. Thus, a patient having signs of multiple
conditions or diseases may have microarray sample data analyzed several
times to clarify possible diagnoses or treatments.

[0054] It is also possible, in several embodiments, to have multiple types
of microarrays, each type having sensitivity to particular expressions
and/or other molecular materials, and thus particularized for a
predetermined set of targets. This allows for an iterative process of
patient sampling, analysis, and further sampling and analysis to refine
and personalize diagnoses and treatments for individuals. While each
commercial vendor may have particular platforms and data formats, most if
not all may be reduced to standardized formats. Further, sample data may
be subject to statistical treatment for analysis and/or accuracy and
precision so that individual patient data is a relevant as possible. Such
individual data may be compared to large databases having thousands or
millions sets of comparative data to assist in the experiment, and
several such databases are available in data warehouses and available to
the public. Due to the biological complexity of gene expression, the
considerations of experimental design are necessary so that statistically
and biologically valid conclusions may be drawn from the data.

[0055] Microarray data sets are commonly very large, and analytical
precision is influenced by a number of variables. Statistical challenges
include taking into account effects of background noise and appropriate
normalization of the data. Normalization methods may be suited to
specific platforms and, in the case of commercial platforms, some
analysis may be proprietary. The relation between a probe and the mRNA
that it is expected to detect is not trivial. Some mRNAs may
cross-hybridize probes in the array that are supposed to detect another
mRNA. In addition, mRNAs may experience amplification bias that is
sequence or molecule-specific. Thirdly, probes that are designed to
detect the mRNA of a particular gene may be relying on genomic Expression
Sequence Tag (EST) information that is incorrectly associated with that
gene.

[0059] First, DrugBank database 302 is exploited as a bioinformatics and
chemoinformatics resource, which contains drug and drug target
information. Up to May 2011, there were 5,461 drugs and 3,880 proteins,
which formed 13,457 unique drug-target pairs in DrugBank 302, and they
were extracted as main drug target information. A database module running
on computer system 210 may serve as a computing mechanism to provide a
set of targets for known interactions of a particular drug.

[0060] Second, the Side Effect Resource (SIDER) database 304 is also
involved. This database aggregates FDA drug labels and disperses public
information on ADRs. There were 877 drugs, 1,447 kinds of ADR, and 61,824
relationships among drugs and ADRs obtained from COSTART and
Euphoria-related ADRs in SIDER. There are 578 drugs overlapped between
DrugBank 302 and SIDER 304. Other relevant databases may also be included
in Drug Information 306, including but not limited to the comprehensive
drug information provided through drugs.com, drug target information from
the Manually Annotated Targets and Drugs Online Resource (MATADOR at
http://matador.embl.de/), and adverse drug effect information from the
FDA's Adverse Event Reporting System (formerly AERS, now FAERS), and
other databases having similar information.

[0061] Third, the Human Annotated and Predicted Protein Interactions
(HAPPI) database 308 may be used as a global human PPI resource, and
optionally a patient microarray sample (for example, obtained by use of
microarray 250 as part of a patient module running on computer 210) may
also be included in network information 314. HAPPI 308 integrates the
Human Protein Reference Database (HPRD), the Biomolecular Interaction
Network Database (BIND), the Molecular INTeraction database (MINT), the
Search Tool for the Retrieval of Interactive Genes (STRING), and the
Online Predicted Human Interaction Database (OPHID). Most importantly,
HAPPI 308 provides a confidence star quality rating from 1 to 5 for each
interaction based on the initial data sources, data generation methods,
and number of literature references for the interaction. Excluding self
PPIs, there are 116,275 PPIs, 61,698 PPIs, 48,481 PPIs, 24,750 PPIs, and
35,752 PPIs involved in the data set from 1 star to 5 stars,
respectively. This data may be used to expand the network of drug
targets.

[0062] Finally, Gene Ontology (GO) project 312 provides hierarchical
terms, including biological processes, cellular components, and molecular
functions, to describe the characteristics and annotations of gene
product. Here we only use biological processes, from a general term
"biological process" in level 1 to specific terms in level 15, to expand
the features in the prediction models from drug targets to the GO terms
in order to investigate the biological meanings between drug targets and
ADRs. There are 3,715 biological process terms utilized for annotating
the drug targets. Other databases involving interactions of metabolites,
RNA, DNA, proteins, other gene expression information and other
macromolecules may be included in Network Information 314, including but
not limited to Anatomical Therapeutic Chemical (ATC) Classification
System, which divides drugs into different groups according to the organ
or system on which they act and/or their therapeutic and chemical
characteristics, and other databases having similar information.

[0063] 2. ADR Vs. Drug Target/Drug Target-Expanding Network Facts:

[0064] By combining the drug target information in DrugBank 302 with the
ADR information in SIDER 304, we obtained tabulation 306 of ADR vs. drug
target facts. The facts follow the format shown in FIG. 4A. If drug n has
a side effect j, the value in cell DSnj (n=1 . . . N, and j=1 . . .
J) at the intersection of column Sj and row Dn is 1 or "TRUE";
otherwise, it is 0 or "FALSE". So does the value in cell DTnk (n=1 .
. . N, and k=1 . . . K) at the intersection of column Tk and row
Dn if drug n docks to drug target k. The binary data DSnj and
DTnk, representing the ADR vs. drug target facts, may be then used
for prediction model training and testing: each ADR Sj is prediction
output (response variable) and targets from T1 to TK are
features (dependent variables).

[0065] When the drug targets expand one level in a PPI network or are
annotated by using the GO terms, the value in cell DTnk will be
integer instead of binary, because the association between drug n and
drug target k could be repeatedly present in drug target expanding
network. FIG. 4B shows an example of a drug target-expanding network, and
FIG. 4C shows the drug target-expanding process and the repeated
presences of T1, T2, and T5. The repeat number here can be
regarded as the weight of the relationship between drug and target under
network level. In this way, software executing according to the
tabulation of 306 on computer system 210 may serve as a network
interaction module that is configured to expand a set of targets based on
network information 314 to produce an expanded set of targets.

[0066] 3. Feature Selection:

[0067] Since thousands of features (drug targets) are required to build
prediction models, Feature Selection process 322 may be exhaustive and
memory consuming. Moreover, some statistics tools, such as R, have memory
limitations. Hence, such limitations may be mitigated against by
filtering out the features that would make little contribution to the
response variable. If the data type of cell DTnk is binary, Fisher's
exact test 324 may be used most effectively; otherwise, Wilcoxon rank-sum
test 326 may be used. In both methods, features are selected when their
p-values are smaller than 0.05. While Fisher's exact test 324 and
Wilcoxon rank-sum test 326 are utilized in this exemplary embodiment,
other tests may be used within the context of the present invention,
including but not limited to: wrapper-based feature selection methods
such as the use of predictive models to score feature subsets prior to
selection, filter-based feature selection methods such as the use of
mutual information or Pearson correlations, or embedded feature selection
methods such as the least absolute shrinkage and selection operator
(LASSO).

[0068] 4. Sample Balancing:

[0069] The sample sizes of output classes are usually biased and
imbalanced, especially in medical data. Consequently, the accuracy of the
prediction result is often overestimated. In order to improve accuracy,
optionally a sample balancing method is also applied. First, the major
classes are randomly separated into many parts. Each part contains a
sample size close to that of the minor class. Second, every part of the
major class is combined with the minor class as training sets 332. The
input data may be separated into several parts for cross validation, for
example ten parts in the process of 10-fold cross validation: nine parts
may then be taken to do sample balancing 336 and the remaining one as
testing set 334 used to validate prediction models 340. Training sets 332
are balanced, while testing set 334 for validation is still imbalanced in
the sample sizes of classes, providing a more reliable performance.

[0070] 5. Prediction Models 340:

[0071] For comparisons, prediction models 340 optionally include two
independent procedures: 1) machine learning--support vector machines
(SVM), and 2) statistical modeling--logistic regression. A Support Vector
Machine (SVM) software package may be used, for example a SVM package in
the R programming language called "e1071". For kernel functions, a
nonlinear function such as a Gaussian radial basis function may be used,
which is also the soptimized kernel function. This SVM package provides
fitted probabilities numerically from 0 to 1, and so does the logistic
regression package used, named as "generalized linear models". The
validity of predictive models 340 may be assessed in Performance
Assessment 350. Software running on computer system 210 may thus serve as
a toxicity module that determines if a toxicity reaction is likely based
on an expanded set of targets to output the evaluation of the likelihood
of toxicity for the particular drug with the particular patient.

[0072] 6. An Example for Predicating Drug Cardiotoxicity:

[0073] Here we use cardiotoxicity as an example to demonstrate how to
apply our ADR prediction approach based on drug target-expanding network
modeling. There are many ADRs related to cardiotoxicity, according to the
index of the International Classification of Diseases 10th Revision
(ICD-10). We merge all ADRs, each of which has an index ranging from I00
to I99 (classified as diseases of the circulatory system), into one
group, SH. The ADRs related to cardiotoxicity in SIDER and their
ICD-10 indices are listed in Table 1. In the ADR vs. drug target/drug
target expanding network facts (See the framework in FIG. 1), if any one
of DSnh is 1, where Dn is drug n, and Sh is in the group
of heart-related ADR (see Table 1), then DSnH is set to 1;
otherwise, DSnH is set to 0.

[0074] We evaluate the performance of ADR predictions in multiple
experiments by applying standard statistical performance-evaluation
measures, i.e., AUC (area under ROC curve), ACC (accuracy), SEN
(sensitivity), and SPE (specificity). For each evaluation experiment, we
repeat the experiments multiple times and report the statistical results,
for example performing 10-fold cross validation three times and take
median values to report prediction performances.

[0077] We can observe significant contributions of PPI networks to both
prediction models, as shown in FIG. 5A. When the SVM line is applied, the
performance prediction goes up from AUC=0.579 (using "No Net", or not PPI
network data) to AUC=0.771 (using "2 Stars UP" PPI network data). The use
of PPI data brings up prediction performances significantly, i.e.,
Accuracy=0.675, Sensitivity=0.632, and Specificity=0.789. The increased
AUC of the "2 Stars UP" condition over the "No Net" condition is
significant, with p-value=4.93e-35 based on the t-test. By further
including the lowest confidence level ("1-Star" PPI network data) into
the drug target-expanding network, the prediction performance decreases
slightly due to noise in molecular networks. The performance curve of the
logistic regression line is comparable to, yet systematically lower than,
that of SVM, moving up from AUC=0.553 (using "No Net") to AUC=0.677
(using "3 Stars UP" PPI network data). The performance of "3 Stars UP"
PPI network data is lower than that of "2 Stars UP" PPI network data, at
Accuracy=0.649, Sensitivity=0.564 and Specificity=0.789. The increased
AUC of the "3 Stars UP" condition over the "No Net" condition is also
significant, with p-value=6.83e-18 based on the t-test. However, the
decreased AUC performance between "3 Stars UP" condition over the "2
Stars UP" condition is also noticeable, likely due to the functional
nature (no longer biased towards physical PPI events) of biomolecular
networks at the "2 Stars" level reported by the HAPPI database.

[0078] In order to control for the effects of using any types of (random)
biomolecular networks and their possible contributions to ADR
predictions, the model's performance was also evaluated with the use of
randomized PPI networks which shared the same network topologies as
actual PPI networks. FIG. 5A also shows that the performance curves using
random networks slightly increased (with AUC>0.55), when the SVM line
and logistic regression line were applied. This result occurs because the
original relationships between drugs and drug targets are still retained
in the simulated random PPI networks. The additional gained prediction
power, however, may only be explained by the embedded useful network
information that our prediction model automatically learned from real
biological network structures. These results show that the contribution
of PPI network data to drug ADR prediction is primarily due to useful
functional information embedded in biomolecular functional association
networks of drug targets and their related proteins, whereas network
topology alone only plays a peripheral role.

[0079] We also assessed whether the increase in our model's prediction
performance may be due to the increase in the total number of features
when PPI network data are introduced. For this purpose, we focused on the
result obtained from the use of "5 Stars" PPI network data, in which the
number of features obtained by the prediction models becomes much smaller
than that without using any network information. We noted that the AUC of
this experimental result is better than that without using any network
information (p-value=2.70e-8 and 8.22e-9 for T-test, when we used SVM and
logistic regression, respectively). To further confirm the relationship
between the number of features captured in the model and the model
performance, we performed another experiment in which we gradually
decreased feature number "2 Stars UP" PPI data in the SVM prediction
model by lowering feature selection thresholds. FIG. 5B shows that there
is no significant (p-value=0.469 using ANOVA) decrease of prediction
performances, when the number of features is filtered down. These
observations further support our original finding that the contribution
of PPI network for a drug's ADR prediction performance primarily comes
from network data themselves.

[0081] We evaluated drug ADR prediction performance by integrating GO
annotations available for each drug's protein targets. In two
experiments, shown in FIGS. 6A and 6B, we directly incorporated GO
annotation labels of drug target proteins into our prediction models.
Since each protein-coding gene may be annotated by many GO terms from
different GO hierarchical levels, we carefully designed experiments to
eliminate potential ADR prediction performance biases due to
non-uniformity of GO term hierarchical levels. We aggregated GO terms to
different GO hierarchical levels by applying different thresholds. Since
GO hierarchical level=1 is not biologically meaningful and there is
insufficient data for GO hierarchical levels from 11 to 15, results for
these categories are not shown.

[0082] In FIG. 6A, the GO terms equal to or deeper than specified
threshold GO hierarchical levels are used to annotate drug targets for
comparative drug ADR prediction performance analysis. Our results suggest
that the prediction performances with the use of GO terms, regardless
which predictive modeling method is used and which criteria is used for
comparisons, are always better than those without the use of GO terms. In
particular, when GO term level 7 (Lv7) is chosen, a best performance may
be achieved with the use of SVM, in which we observed AUC=0.729 and
Sensitivity=0.806; in comparison, "No Net" (without the use of GO term
information) has AUC=0.579. The improvement in overall ADR prediction
performance defined by AUC is significant (p-value=1.80e-18, based on
t-test).

[0083] In FIG. 6B, the GO terms deeper than level N are replaced by their
level N GO term ancestors to annotate drug targets for comparative drug
ADR prediction performance analysis. We call this process a "Roll Up" and
observed similar results as in the first experiment. In particular, when
GO term Lv7 is chosen, a best performance can be achieved with the use of
SVM, in which we observed AUC=0.736 and Sensitivity=0.800. The
improvement in overall ADR prediction performance defined by AUC over the
"No Net" experiment is also determined to be statistically significant
(p-value=7.75e-17, based on i-test).

[0084] Based on the above two experiments using GO terms, we understand
the following results. First, the use of GO annotations improves a drug's
overall ADR prediction performance. Drug ADR prediction performances
achieved with the best use of GO annotation (AUC=0.736) are almost
comparable to those achieved with the best use of PPI networks
(AUC=0.771). Second, SVM models achieve better performance than logistic
regression models. Third, to achieve better ADR prediction performance,
both SVM models and GO biological process use categorical terms at
sufficiently detailed term levels (e.g., level 7) to annotate drug
targets. Fourth, by evaluating detailed prediction performances achieved
with PPI networks (SEN=0.632, SPE=0.789) and GO annotations (SEN=0.800,
SPE=0.583), the integration of biomolecular network data increases the
specificity (SPE) of ADR predictions, while the integration of GO
annotation data increases the sensitivity (SEN) of ADR predictions.

[0085] 3) A Good ADR Prediction Model is Concentrated not Only on Drug
Targets Implicated with the ADR Events, but Also on Many Non-Target
Proteins Directly Linked to ADR Mechanisms:

[0086] We further investigated the biological network contexts for 101
proteins selected automatically by the SVM prediction model as features.
We expanded these "seed proteins" with "2 Stars UP" PPI interactions to
build a PPI interaction network shown in FIG. 7, by using the nearest
neighborhood expansion method. We used node color and counts (in diamond
shapes) to show how much evidence from PubMed might be identified in each
protein.

[0087] Many selected proteins are closely associated with cardiotoxicity.
For example, ADRB1 (Adrenergic, beta-1-, receptor) mediates hormone
epinephrine and neurotransmitter norepinephrine. The polymorphisms of
ADRB1 have been shown to be involved in drug cardiotoxicity in heart
failure. Autoantibodies against the beta-1-adrenergic receptor have also
been shown to have idiopathic dilated cardiomyopathy in some patients.
Therefore, ADRB1 as a known drug target and serves as a reliable
predictor.

[0088] We also observed that the drug target-expanding network may bring
forth additional cardiotoxicity-related non-target proteins, e.g., ERBB4
and CYP2D6. ERBB4, a v-erb-a erythroblastic leukemia viral oncogene
homolog 4, is a member of the type I receptor tyrosine kinase subfamily
and encodes a receptor for NDF/heregulin. Targeted deletion and
inhibition of ERBB4 signaling may lead to congestive heart failure
resulting from cardiovascular defects. CYP2D6 encodes a subunit of the
cytochrome P450 superfamily of enzymes. The gene is specifically
expressed in the right ventricle and its genetic polymorphism is known to
be associated with cardiotoxicity, including a patient's poor
anti-arrhythmic activity, severe cardiovascular, or dilated
cardiomyopathy.

[0089] The following references were used in the development of the
present invention, and the disclosures of which are explicitly
incorporated by reference herein:

[0108] While this invention has been described as having an exemplary
design, the present invention may be further modified within the spirit
and scope of this disclosure. This application is therefore intended to
cover any variations, uses, or adaptations of the invention using its
general principles. Further, this application is intended to cover such
departures from the present disclosure as come within known or customary
practice in the art to which this invention pertains.