MEETING REPORT

PURPOSE

The overarching goal of CFSF is (1) to develop an integrative strategy for generating predictive molecular activity and cellular feature signatures induced by systematic perturbations of cell-based systems, and (2) to apply this information to an understanding of biological networks in health and disease.

The primary objective of this workshop was to develop a strategic template for undertaking the CFSF initiative. This template must assess critical parameters affecting choice of cell types, perturbations, cellular and molecular phenotypic assays, data analysis methods, and data storage and presentation formats.

BACKGROUND

Biomedical research has traditionally focussed on characterizing individual or subsets of molecules in an effort to understand the complex functional interactions within a biological system. However, this becomes a limiting factor when screening for optimal targets for therapeutic interventions or during the development of a suitable phenotype. Extensive knowledge of the structure and function of the biological networks involved and their responses to diverse perturbations can help overcome this limitation. Characterization of biological networks in a wide range of normal and disease contexts is critical for understanding how genetic and environmental perturbations produce pathological conditions, and for developing novel interventions that aim to return perturbed networks to normal states. Studies using classical genetic screens demonstrate that disruptions in a biological pathway yield related phenotypes at both the cellular and molecular levels. Hence the generation of perturbation-induced molecular activity and cellular signatures can be used to infer mechanism-based relationships among perturbing conditions as well as functional associations among responding cellular components. The generation of a library of network- based cellular signatures will optimize the derivation of mechanistic insights into disease etiology and the identification of novel drug targets.

ORGANIZATION OF WORKSHOP

The workshop started with a series of six presentations on the current state and best practices of network reconstruction and drug discovery from researchers from both academia and pharma. Speakers were charged with relating their presentations to the goal of the workshop. Presentations highlighted cutting edge technology and identified strengths and limitations in the current approaches. Representatives from pharma indicated success with the CFSF approach in their in-house research and agreed with academic researchers on the power and transformative potential such a program could have on biomedical research, if sponsored by the NIH.

The afternoon of the first day was devoted to breakout groups on four topics: Perturbation Screening, Phenotypic Assay, Data Analysis, Biological and Clinical Applications. On the second day, there were summaries of the conclusions from each of the breakout focus areas followed by general discussion and synthesis of challenges and opportunities for the CFSF program.

BREAK-OUT GROUP RECOMMENDATIONS

Perturbation Screening: Identified the relative advantages and disadvantages of two types of perturbations: targeting proteins with small molecules or targeting genes with RNAi. Neither approach was inherently better than the other, as each had its own strengths and weaknesses. The group also identified additional types of perturbation approaches that could be considered for CFSF. The groups discussed the risks and benefits of combining perturbations to complement single perturbation screens. The groups discussed strategies to optimize small molecule screening to efficiently cover chemical or genetic space. Recommendations for implementing perturbation screening for the CFSF program were to: (1) keep approach as simple as possible to implement, (2) adopt best practices for data generation, handling, and provide access to raw experimental data along with processed results, (3) capture early response time points, (4) have strong programmatic assessment, to keep the program dynamic and allow for mid-course corrections. The summary also emphasized that the integration of the different screening procedures, data sources, and heterogeneous data are substantial challenges.

Phenotypic Assays: This group identified most informative and/or low-cost molecular signature assays currently available, including DNA arrays (both standard and newer versions) and chemical phenotypes (through high content imaging, protein localization, biochemical and reporter assays). Technology development is needed for high-information content assays for proteomics, phospho-proteomics, and genome modification. Recommendations for addressing the challenge of integrating data from molecular and non-molecular cellular phenotype screens included use of low-throughput curated annotations and high through-put database integration, but the group emphasized the great importance of improved controlled vocabularies to enable standardized annotations. Choosing cell types for CFSF must take into account technical issues, appropriateness of model organisms, and natural variation between cells of the same type. The group recommended concentrating on fewer cells types and characterizing them thoroughly rather than broad shallow surveys across many models. The group also emphasized the importance of validation of findings, preferably in the whole organism.

Data Analysis: The groups identified that the data handling and analysis challenges presented by the CFSF program will depend largely on the structure of the program: whether there will be a central data coordination center that integrates data from separate data generation centers or whether the bulk of data integration would be performed by the centers themselves, and the data coordination center merely coordinates across these center-specific data sets. This group emphasized the importance of ensuring that data are useable and integratable, through use of common vocabularies in annotation and in standardized protocols in collecting the data along with access to both raw and processed data. The challenge that will be presented by the highly heterogeneous data generated by CFSF will be substantial and will change over time with development of new techniques, so the program must allow for this flexibility. The data analysis group also strongly encouraged specific programs to support collaboration and development of new approaches for data analysis. Data querying tools must be able to accommodate a wide range of users. A user-friendly interface for biologists to query the database is essential, but there must also be support for sophisticated queries by expert users. The most productive approaches for maximal effectiveness of computation and data analysis ensure that the computational experts are involved in the experimental design phase, that new computational approaches can be added later in the process, and that software is openly and freely available to the community.

Biological and Clinical Applications: This breakout group emphasized that data analysis and data collection should be integrated into each center, although additional purely analytic proposals should be supported through a linked/collaborative grant mechanism. New data generated by the CFSF program must be integrated with existing knowledge (genomic/transcription/pathways) whenever possible. One of the greatest challenges will be the interface between cellular and organismal level and that the program should not cause this division to be wider, but should work to bridge it. There was also interest in exploring ways to incentivize companies to share their data on drug pathway interactions.

Final Recommendations: It was recommended that CFSF be undertaken in two distinct phases. The goal of phase 1 is to test out multiple ideas including such things as selection of cell lines, perturbations, uniform and standardized assay protocols, development of a database and standard vocabularies for describing the assays and results, as well as efforts for technology development focused on reliable high throughput measurements. Some of the challenges identified during both phases were (1) figuring out how to concentrate on a few cell types and systems and finding assays that can be analyzed together, (2) storage of the matrix of data (both raw and processed) in a single data model while making it available in multiple formats for computational analysis and general browsing, and (3) classification of phenotypic signatures that is more relevant and sophisticated than available through efforts like GO. The participants agreed that along with gene expression the CFSF effort should focus on multiple cellular phenotypes including high content imaging, protein localization, biochemical assays, reporter assays supplemented by kinetic assays. Important perturbations to be studied should include small molecules, RNAi, along with environmental (O2, glucose, pH) perturbations in dose-response experiments at multiple time points.

Other key final recommendations included the following,

Datasets generated through this initiative should also be integrated with existing genomic/transcription/pathway data.

Key requirements of a public resource include data provenance, availability of raw data to perform new normalization, assay specific experimental controls, SOPs for each assay and screening platform. Need a reagent repository for cells and perturbagens and assays.

Initiative should push the boundary of possible rich assays (technology development for moving these assays to higher throughput, e.g., antibody-arrays for time-resolved changes of protein abundance and phosphorylation).

It would be critical to include second-stage funding (R03) with rapid review cycle for limited-duration/limited-funding for computational analyses and secondary assays to validate primary. Complement ongoing studies of “generic cell types” (e.g. HeLa cells) with analysis of differentiated cells types.