Efficient Analytics for Clinical Array-Based Copy Number Screening

Introduction

When we discuss with our clinical users what they expect data analysis software to do for them, how they expect to benefit from array-oriented Copy Number software, often the following three points come up.

Ability to easily edit the list of anomalies and to save it into a standardized clinical report document.

infoQuant suite of Copy Number analysis software hits all three points above and has many other unique characteristics such as compatibility with all major array platforms, platform-specific detection methods, fully interactive interpretation workflow etc. We designed all components of the suite around a simple idea: to help Cytogenetic laboratories across the globe to dramatically improve efficiency of their reporting procedures. And in this paper we will try to illustrate how infoQuant software is used to streamline data analysis step in routine array-based Cytogenetic screening procedures

Data import and organization

Both infoQuant analytical packages – oneClickCGH and CGH Fusion – work with a variety of raw array data formats generated by major array platforms. Our built-in analytics covers such array types as

Affymetrix SNP 6.0, CytoScan HD,

Agilent CGH and CGH+SNP arrays,

Illumina CytoSNP and Omni array range,

BlueGnome

OGT

Roche-Nimblegen CGH/CNV array range and

other array platforms.

Intelligent data import module not only makes loading data an easy single-click task, but also gives clinical infoQuant users a capability to compare their legacy data across different types of array and to make use of every piece of array data they accumulate in their clinical tests. Being able to look at an archive of patient results obtained using BAC arrays a few years back proved to be a vital capability during interpretation of high-resolution oligo data for many labs. Staying open to new more advanced and efficient hardware platforms is also critical to high-throughput laboratories, and analytical software that makes switching costs in terms of staff re-training and workflow re-design much lower is of a great benefit to them. And with NGS technology actively making its way into Cytogenetic laboratories, cross-technology compatibility of analytical software will prove beneficial once again.

In a high-throughput environment handling of array data becomes a challenging task. Data volumes themselves pose a significant issue, but when one multiplies it by the complexity of interpretation procedure for each individual sample, reporting process can quickly become unmanageable. infoQuant's CGH Fusion software was designed specifically with this problem in mind and tackles it from several directions.

First, depending on particular array technology, software can compress probe-wise results down to 20% of the size of the original file generated by hardware platform, which makes data storage and data transfer across network much easier.

Second, project organization in CGH Fusion provides every user with an easy-to-navigate overview of dozens or hundreds of samples submitted for interpretation with a detailed and intuitive snapshot of its reporting stage for every sample.

And, third, ability to seamlessly integrate with existing sources of patient information (demographic or phenotypic) into analysis workflow or to add such information manually makes sure that all types of available information are captured during a Cytogenetic test.

When laboratory's throughput reaches a critical scale, implementation of a dedicated data management solution like cnTrack may be required. Please see a separate "Effective data management" white paper for more detail on this topic.

Prioritization of detected DNA anomalies

infoQuant software takes care of detecting such DNA anomalies as copy number changes and regions of LOH in thousands to millions of probe-wise array measurements, how to make use of these results, however, is sometimes very much up to the user. Although our solutions are fully equipped for assisting cytogeneticists in this critical task too.

Data interpretation starts with the assessment of overall data quality for an individual array-based test. For each sample in the analysis pipeline software provides a quantitative snapshot of various quality metrics optimized for each individual array platform. These metrics can be matched against laboratory's quality criteria, and a particular sample can be omitted from further analysis at this stage, if the criteria were not met. This is a very efficient tool for monitoring array experiments for possible protocol irregularities.

The most critical and time-consuming part of the analysis is assessment of clinical relevancy for each detected DNA abnormality. This is the part where significant amount of input is required from a geneticist in order to make sense of suspected regions. There is a wealth of software components in infoQuant suite that help to streamline and to automate this process as much as possible. Both oneClickCGH and CGH Fusion packages assist laboratory geneticists with relevancy assessment of their results at three different levels.

User interface is designed to simplify navigation between abnormal regions as much as possible to save users time when switching from one anomaly to another. Ease of interaction with regions highlighted on chromosomal diagrams and speed of switching between different sources of important information often defines throughput level of a genetic laboratory. Users of infoQuant solutions can reduce time required for sample review by up to 40% by using our interactive interpretation workflow.

Clinicians frequently have to go through loads of gene and region information when trying to interpret a particular abnormal region. Therefore, availability of various types of gene/CNV annotations is an important aspect of all our solutions. Such publically available annotations as gene-disorder associations, biological processes, gene functions and frequent copy number variants are at the fingertips of a geneticist, which can help to narrow down suspected regions of clinical relevance quickly and accurately. In addition, CNV frequency profiles of various groups of HapMap samples are accessible during interpretation and can provide additional means to single out clinically relevant regions. Region annotations may also come from previously analyzed samples via custom track visualization. Custom tracks are easy to create and edit right there in the software, which makes re-use of legacy test results and their interactive update with newly available data rather straightforward. Such historical information can be further drilled down to patient's demographics, phenotype and family relationship.

Finally, if the user decides to rely on assisted interpretation module of the software, such option is also available and can save a lot of time during routine analysis. With a simple mouse click clinically relevant abnormalities can be highlighted and restricted to a disorder class of interest, for example. Shortlisted regions can be further prioritized according to their likelihood of occurrence in a normal patient.

The way array CGH or SNP array data interpretation is performed is often unique to a particular genetic laboratory. Depending on the scope of screening procedure and sample throughput, infoQuant users generally are able to reduce time spent on reporting on each patient by 40-70% with software automation.

Report generation

Results of data interpretation should generally be recorded in a format that is commonly accepted and easy to share within and outside laboratory. The most preferred report formats are graphics-rich documents that can be kept and transmitted in an electronic form and/or can be printed if paper trail is a requirement. Requirements for the amount of information and presentation style of the report document can vary greatly from one institution to another and from one test type to another. Additionally, depending on their target audience laboratories may need two or sometimes three different report formats: for internal purposes and for outside reporting, for instance.

infoQuant suite allows each individual laboratory to choose their documentation format, to select from a variety of pre-set report components and to customize how different components and information types are presented. This approach introduces healthy level of flexibility and possible customization to reporting procedures within the software, at the same time standardizing the end-result of the data analysis workflow.

Conclusion

High-throughput Cytogenetic analysis is often a challenging task from many points of view and data interpretation is far from being the least important part of it. There are many potential roadblocks on a way of successful laboratory-wide implementation of array-based reporting processes. They arise from such aspects of the job as large and continuously increasing data volumes, evolution of technological platforms, uniqueness and complexity of each patent's profile under review, multi-user environment and, sometimes, staff turnover.

infoQuant solutions were designed to specifically tackle those potential roadblocks through intelligently automated workflow, ease of deployment and use, flexibility of data interpretation process and standardization of reporting procedures. Such streamlined data analytics can greatly reduce laboratory costs in terms of staff time freeing up resources for scale-up of its array CGH or SNP array operations.