Role of image informatics in accelerating drug discovery and development.

By Robert Dunkle

Winter 2003

There is a new category of informatics that is proving itself essential to the acceleration of drug discovery and development. Image informatics is the systematic use of image data as a means to help interpret experiments and understand biology. Up to 70% of the experiments in pharmaceutical research and development result in an image as an output, there is vast potential for image informatics to become a major component of life-science work practices. Image data becomes a valuable asset when it can be fully managed, analysed and interrogated.The result is new insights into the behaviour induced by compounds leading to a faster time to market across the portfolio of candidates.

The role of informatics as a component of discovery and development has been increasing steadily. UBS Warburg reported in 2001 that approximately 9% of research and development dollars for large pharmaceutical companies is being spent on information technology (IT.) Furthermore, the IT share of total R&D spending is projected to increase over the next five years, reflecting the convergence of biology and informatics.

In the realm of life-science informatics, the value of cheminformatics has been repeatedly proven over the past two decades to the point that computational chemistry and chemical database management systems are a staple to target validation, lead identification and lead optimisation processes. Similarly, bioinformatics is becoming a staple in processes ranging from target identification, target validation, pathway analysis and pharmacogenomics. Until recently, image informatics had not enjoyed a role as important as that of cheminformatics and bioinformatics (Figure 1). But why?

Why has image informatics taken so long to evolve? Images are a very difficult data type to handle and thoroughly interpret. There are numerous reasons why:

? Images are generated in multiple labs from a variety of instruments. This creates ‘islands’ of data.

? Images come in a wide variety of formats, many of which are proprietary.

? File sizes for a single image can be quite large ranging from 10MB to 1GB.

?Network bandwidth issues limit transfer of data.

? Film based images are difficult for groups to review.

? Image storage has been costly.

? Images have no inherent context and require supporting data

. Technology has changed this situation. Networks are significantly faster; data storage capacity costs have dropped dramatically; CCD cameras have made high-quality digital image generation simpler, faster, and less expensive. With these technological advances plus the advent of software to help add context to the images and manage image data, the prospects for image informatics are solidly established.

Image Informatics Images are generated throughout the research and development process (Figure 2).

Mining of these images becomes the opportunity for accelerating these processes. Some examples:

Improve productivity in the lab Finding images is demanded by many tasks: diagnosing results, comparing results, peer review and preparing reports, to name a few. With data spread out on CDs or a file server indexed by obscure file names and with very large original images, much time is spent aggregating images and other related data. The oft-reported two-hours-per-week of time wasted looking for images translates into a 5% loss of productivity for this single item alone. Another major productivity factor comes from the elimination of duplicated work. A large pharma recently reported that the undocumented images from a fiveyear study were rendered useless when a principle researcher left the company. Another company reported that 4-5% of a lab’s work was a duplicate of prior work. Given the pressure to increase productivity in today’s operations, image management provides an obvious vehicle to reduce inefficiencies through relatively simple process enhancements.

Fast and effective communication Sharing results is dramatically simplified and expedited through the dynamic generation of image collections that adhere to study constraints. Sorting and collating results by hand and literally cutting and pasting results in reports can be replaced by a capability that generates the desired results on demand and provides for those results to be viewed by designated colleagues instantaneously, worldwide.

Facilitate interpretation of experiments At the simplest level, comparing the results of multiple experiments provides a basis for determining the similarities and differences in how life systems (organs, tissues, cells, subcellular components, etc) respond to insults such as diseases, system upsets or compounds. Not only are comparative scores important (whether a diagnostic or quantitative descriptor), but so too are differences in the nature of the response – even with identical scores – which can be compared and, thus, suggesting the same or different mechanism of action. This type of comparative work is extremely difficult without an organised informatics system to facilitate the effort. Additionally, there is a clear movement in life-science research and clinical operations to observe the response of ‘biomarkers’ as indicators of experimental outcomes. Biomarkers often require specialised algorithms to quantitate results providing a descriptor that can be used for interpretation and comparison. Figure 3 shows how image data can be organised on the fly specific to a question at hand.

More effective lead optimisation through cellular screening As the pharmaceutical industry began adopting high throughput screening in the mid-90s, firms moved from too few hits to too many hits. Unfortunately, hit selection was often based on a single or very few parameters such as a measurement of binding, inhibition or other criteria. Many of these measurements are blind to the mechanism of the interaction of the compound with the assay. With the advent of high-throughput cellular screening, the nature of a cell’s response to compounds can be discerned. This, in turn, provides a basis for selecting hits on multiparametric considerations or parameters that reflect a particular type of cellular response. These cellular assay technologies require up to 100,000 images to be analysed quantitatively per day which begs for a scalable, mission-critical image management system in order Figure to support the screening operation and postscreening analysis and reanalysis of the data.

Systems such as large-scale cellular screening systems have the potential to create large islands of data. The proliferation of cell-screening platforms enables pharmaceutical companies to select and use a variety of platforms and assays to address different project objectives. An implication of this practice is that multiple islands of raw image data are generated unless data is moved into a system agnostic to the various image file formats and data hierarchies employed. Again, the systems approach of image informatics provides a basis to create a single, accessible repository of images, image metadata and other related experimental information that is available for mining.

Virtual screening Reanalysis of the cellular assay data is a perfect example as to how the knowledge captured from experiments can be used to understand biology better. During the process of running a cellular assay, predefined algorithms are applied to images to quantitate a particular response of interest. Importantly, should researchers have a new question that they would like to ask of the data set, new algorithms can be developed to answer those questions without running the physical experiments again which is particularly important for data sets difficult to generate, such as patient data. This virtual screening is both timely and very cost-effective. Not only are labour and cost for reagents and disposables saved, the time to get the new results can be a mere fraction of the time of repeating the experiment. However, if the image data is not stored in a form conducive to both easy retrieval and reprocessing with different algorithms, such mining cannot occur.

Protein expression patterns to help identify mechanism of action 2D electrophoresis gels allow for differential protein expression analysis. With an accurate quantitative description of a gel, it becomes possible to compare gels more exactingly and to search for expression patterns throughout the entire database. A top pharmaceutical company recently reported that the analysing images depicting protein expression patterns from 40 compounds resulted in 19 hypothesised mechanisms of action. Subsequent experimentation pinpointed five compounds with mechanisms that were desired. These five compounds were ‘parachuted’ from earlystage lead optimisation to late-stage lead optimisation saving up to two years for those compounds to reach that same stage of development.

Comparing results from specialised experiment Xenogen offers an integrated suite of imaging and transgenic technologies that typify how many new methods utilise imaging to advance drug discovery. The in vivo biphotonic imaging illuminates biological processes taking place in a living mammal providing a ‘window’ into the organism, and makes possible the tracking of biological activity in real time, at the molecular level (Figure 4). Through a programmatic interface, images from Xenogen’s Living Image® software is automatically downloaded into an image database management system which is capable of uploading the data back to the Xenogen system for reanalysis. The database is a vehicle for browsing and searching for relationships among the data sets, integration with multimodality image data from all sources, and integration with related experimental data. Communication of data analysis findings within a department and across sites worldwide provides an environment that facilitates new ideas and helps ensures that decision-making is fact-based.

Correlating results from preclinical experiments Typifying drug discovery in general, preclinical research involves multiple disciplines such as pathology, toxicology, pharmacology, autoradiography and others in order to study the effects of a compound or family of compounds. Image informatics can help correlate data from these disciplines to gain a better understanding of biological mechanisms. For example, if work in the autora-diography lab (Figure 5) were to detect an uptake of a compound in the spleen, a researcher in the autoradiography lab could contact colleagues in pathology to determine the observed effect of spleen tissue under those circumstances. Importantly, other compounds which realised a similar uptake in the spleen could be retrieved from the database to provide a point of comparison for both the autoradiography and pathology experiments. This comparison could suggest which set of compounds will have a similar pathology and provide a frame of reference for interpreting the most recent experiments.

Comparison of results requires that image data be placed in context. Context typically consists of four components: representation of the data within an ontological knowledge structure for the organisation, the experimental protocol, experimental results and links to associated data. With the context defined, data can be sorted, clustered and compared by a variety of software packages to enable researchers to consider these results and look for relationships that are descriptive or, potentially, predictive. Although a pathological analysis or cellular assay analysis shows similar gross scores, consider that there could be different mechanisms at work that are the result of particular functional groups on the compound scaffold being tested. How does one, then, uncover this behaviour? Relating scores of image features with chemical structures provides a platform from which an investigator can consider alternatives and gain insights to behaviour. An example of the structure of a table to assist in this comparison is shown in Figure 6. It is called an ISAR (image driven structure activity relationship) table.

Speeding up clinical trialsConsider the following problem. Multiple offices take images daily of patients around the world. The images are received in a variety of image formats, media and delivery mechanisms (CDs, diskettes, e-mail, FTP). Unfortunately, there is no central system into which to store this data nor is there an appropriate security mechanism for individual databases that are used as stopgap measures. Yet the trials management team needs timely information that can be reviewed to assess the status of the trials and make decisions on conducting the remainder of the trial.

An image database management system can address these issues and provide for cost and time efficiencies. Each office can load data directly for automatic population of a central trials repository. This data is instantaneously available for the researcher with appropriate security clearance to assess and to communicate with other team members in a manner compliant to 21 CFR Part 11. Searches and comparisons are quick and easy and images can be exported to other applications or used in reports. Once the study has finished, the images can be archived and saved as historical data that can still be accessed and viewed if necessary. The net result is less cost, less time and a resultant knowledge database that can be used for the compounds under investigation and as a point of reference for future preclinical and clinical studies.

System considerationsWithin a single laboratory environment A very capable image informatics system, properly designed, can integrate the processes of a single laboratory without compromising simplicity of system installation or operation while allowing for extensibility and integration with other corporate systems as required. The goal of any system – for a single lab or for the enterprise – is to provide an excellent return on time and money. Adaptation to work practices is an important component of reaching this goal. In addition to software, hardware issues need to stand the same test of simplicity, integration and extensibility.

Some of the considerations for system design include the ability for users/administrators to define security rights for the images and to define appropriate annotation structure that provides for a set of image processing options, etc. With this simple set-up, a laboratory will begin building a lab knowledge repository, gain significant productivity improvements and allow for better interpretation of experimental results.

Strategic informatics Another dimension of image informatics is how it supports the overall discovery and development objectives as part of the strategic informatics infrastructure needed for extended discovery and development operations. This dimension imposes a set of very demanding requirements including scalability, extensibility to a highly diverse set of end-user applications, integration with a highly diverse set of related applications, a broad set of functionality that addresses the multiplicity of file formats and work practices found in the enterprise, network compatibility and simplicity of enterprisewide implementation, training and support.

Yet, image informatics does not stand in isolation. There are two fundamental integration issues: integration of solutions to work practices and integration with related data for mining and analysis. Work practice integration involves interfacing with image-capture devices, LIMS, and data storage system for minimising the efforts of researchers generating and assessing experimental data. Data mining and analysis require image-knowledge repositories to share a structured co-existence with other data repositories such as chemical, gene, protein, analytical and text. Links and ontologies enabling integration of the body of experimental evidence provide for rich associations upon which image informatics is realised. An open architecture facilitates work practice, data mining and analysis.

Return on investment Many examples for a return on investment resulting from the deployment of an image informatics system have already been mentioned. Each of the following can, by themselves, conclusively justify the initial cost, implementation and ongoing operation of an image informatics system:

? Productivity improvements.

? Savings from avoiding duplicate experiments.

? Virtual screening as a replacement for rerunning experiments.

? Terminating compounds earlier based on imagederived evidence.

In addition, there are several process-oriented benefits that are more difficult to quantify but clearly have an impact in advancing the thinking in a project. These include:

The real payoff from image informatics comes from the insights gained into biological mechanisms. Insights lead to testable hypotheses to verify a mechanism of action that could lead to terminating a compound, tuning the scientific direction of a project, or accelerating the consideration of a compound as a drug candidate. Each of these consequences of insights from image informatics has been realised in practice. And each of these consequences can have substantial cost and time savings. At the industry benchmark of $1 millionper- day payback for delivering a drug earlier, these benefits become paramount and vault image informatics into the revered class of mission-critical applications.

Regulatory imperative Certain image-generation processes will fall under GxP and 21 CFR Part 11, which deals with the FDA’s regulations to assure control of electronic data. Operations in clinical, preclinical operations and processes upstream of preclinical could require adherence to these regulations. From a software perspective, this translates into product features such as electronic signatures, image versioning, audit reports and system validation. Regulations are set up to verify that the original data was not changed and the necessary changes in the representation of the data are auditable. Image data creates added difficulties with the need to tracking changes on a pixel-by-pixel basis. Designing software to save the original image and to track the processing step of image versions automatically provides the imaging laboratory with a controlled environment that does not impact daily workflow. This special functionality – specific to image data – allows for the regulations to be accommodated and to minimise the impact on the research or regulatory departments with life-science companies.

Summary – can drug discovery and development be accelerated? The evidence is conclusive. And the answer is that image informatics can absolutely accelerate drug discovery and development. Acceleration occurs through productivity improvements, better experiment interpretation and new insights derived within a single lab. When applied across labs ranging from target identification through to clinical, acceleration across the entire process multiplies. A commitment to informatics and knowledge management is the first step in this process. The return on investment from ongoing application of image informatics will provide the ongoing evidence that it is a mission-critical component of drug discovery and development.

Robert Dunkle has more than 20 years’ experience in general management, business development and marketing strategy in the bio and cheminformatics markets and is currently CEO of Scimagix. He previously served as President and CEO of the National Centre for Genome Resources where he was responsible for establishing and implementing a new mission and strategic plan serving the bioinformatics and computational biology community. Before that he was Vice-President at MDL Information Systems Inc, where he managed its ISIS™ cheminformatics software platform.