Software Standards in Next-Gen Genomics

LIMS can help automate data transfer with sequencers.

By Bruce Pharr

September 27, 2011 | Guest Commentary | Standards are emerging in next generation sequencing (NGS), thanks mainly to the critical investments labs have made in instrumentation. Sequencing proceeds faster and produces higher quality results when labs are able to consistently and rapidly prepare samples for sequencing, monitor sequencing runs in real time, and deliver appropriately formatted data for analysis. Each of these capabilities can be supported with preconfigured workflows in a Laboratory Information Management System (LIMS) built to support the best practices associated with running a lab’s preferred instrumentation.

As sequencing throughput grows, instrument manufacturers have sought to standardize the wet lab processes used to prepare DNA libraries. Labs hate to waste resources on library prep only to have runs fail due to preventable errors. Today, most instrumentation vendors have grouped recommended laboratory procedures into prep kits, such as Illumina’s TruSeq DNA and RNA protocols.

Software vendors can, consequently, develop LIMS workflows mapped to the lab work and procedures associated with these kits. For example, a workflow for Illumina TruSeq kits could capture specific sample tracking details, including sample contents of microtiter plates and flow cells; index composition of multiplexed libraries; lot numbers of kits and consumables; and experimental parameters that vary between sample batches.

Preconfigured workflows remove many of the tedious manual steps associated with library preparation. A LIMS can automatically calculate the titrations necessary to normalize a library, and enables scientists to browse the contents of tubes and plates by the laboratory process that created them, helping them reduce sequencing time and reagent use.

Preconfiguration Plusses

Vendors have also developed standards for the media onto which libraries are loaded for sequencing. Clonal amplification for Illumina HiSeqs occurs using the cBot cluster generation system, which employs a proprietary emulsion PCR-free technique. Life Technologies’ SOLiD systems accept PCR-prepared clonal bead populations deposited on a flow slide. Roche 454 systems utilize a pyrosequencing-based technique using emulsion PCR and agarose beads.

The most important step in integrating LIMS with instrumentation is to consider how users gather information about samples and associate them with runs. Workflows can specify the appropriate type of integration based on the lab’s instrumentation. Instrument integration capabilities have long been a key selling point of LIMS. In a 2009 report, users cited instrument integration as the most desired capability in a LIMS, with 70% of academic lab managers ranking it number one.

The best LIMS can apply either a push model (in which the LIMS supplies data to the instrument and periodically checks back to download run data when complete) or a pull model (in which the instrument initiates contact with the LIMS to collect the data necessary to run experiments). Either way, certain tasks can be preconfigured. These include ensuring that items sent to the sequencer are appropriately matched to data in the LIMS; generating the necessary files to communicate with the sequencer before and after sequencing; monitoring status directly across multiple instruments; and conversion of raw data files into FASTQ format for analysis.

With some runs lasting a week or more, it is inefficient for organizations to wait until those runs are completed before evaluating data quality. Preconfiguration can enable a LIMS to monitor sequencing as it occurs and collect a range of key primary analysis metrics automatically from each run. Common metrics might include the total bases yielded from a run or the percentage of base calls with a PHRED quality score of more than Q30. A LIMS can also produce reports after run completion to summarize contextual run information about the samples included in the run, identification numbers, the location of the raw data, and the sequencer used. More importantly, a LIMS offers a way for labs to compare sequencing performance and view trends from data accumulated over time. With data from multiple sample runs archived and searchable in a centralized LIMS, labs can make more informed decisions about which samples to rework or how much time to spend on further analysis.

The key to developing useful preconfigured software for NGS labs is to identify universal and ubiquitous workflows. The more variation associated with a task, the less sense it makes to preconfigure this task; variable tasks are best left to labs to customize so that the system matches their science. Preconfiguration is ideal for accommodating prescribed lab protocols and workflows; routine and necessary “munging” steps that convert and transfer data; and the data validation and reporting steps required prepare data for sequencing or report results.

LIMS preconfiguration can enable software to codify best practices in NGS wet-lab procedures and data management. And when a preconfigured LIMS also leverages modern architectural styles and scripting tools, it provides two things NGS labs need: the ability to get their preferred instrumentation rapidly up and running, and the flexibility to customize the LIMS to accommodate unique laboratory procedures and workflows. •