Jun 18 2018

Systems Thinking and DNA Mixtures

Do you struggle with DNA mixture interpretation? Would you like to learn more about systems thinking and how this approach can positively impact mixture interpretation results and probabilistic genotyping outcomes? Are you interested in better understanding the differences and relationships between sensitivity, resolution, analytical thresholds, and other relevant analytical figures?

In her workshop at ISHI 29, Catherine Grgicak, Associate Professor in the Department of Chemistry at Rutgers University, will chair a workshop filled with experts who will show you why a systems thinking approach to validation will positively impact mixture interpretation results and probabilistic genotyping outcomes and how you can set this up in your lab.

We recently interviewed Catherine, and she describes what Systems Thinking is, how the PROVEDIt DNA Database came to be, and how her background in physical chemistry drives her research.

Hi Catherine, first I’d like to thank you for being a part of another workshop at ISHI! Mixture interpretation continues to be a hot topic, so I know attendees will benefit by attending this workshop as well.

The introduction and subsequent adoption of probabilistic genotyping seems to have brought more powerful methods of analyzing mixtures, but also more complexity. Could you explain any challenges a lab may face when using probabilistic genotyping?

The adoption of computational solutions for inference has compelled the forensic DNA scientist to think deeply about problems that have been present for decades. It is not probabilistic genotyping that brings more complexity – the complexity has always existed and is related to the data generated from the amplification of DNA from potentially partial genomes from any number of unknown contributors; thus, it is the nature of the sample itself that compels the forensic community to implement technological advances related to data analysis and interpretation.

Implementation of new technology, however, requires validation and education necessitating the laboratory expend significant resources. Probabilistic genotyping systems generally output the likelihood ratio, which has become the prevailing means of communicating the strength of DNA evidence and is the ratio of the probabilities of the evidence given two mutually exclusive hypotheses. Both the evidence, E, and the hypotheses can impact the ratio, and this workshop will focus on optimizing and understanding impacts of the former on the likelihood ratio. We shall also explore how our validation philosophy impacts the ratio since the information content in E is inextricably linked to our processing decisions and the way in which we tested and validated our laboratory pipelines.

There are a few different software options available for analyzing mixtures. Can you elaborate on these options and how they differ in their approach?

There are two basic types of forensic DNA interpretation options available: those that have been described as semi-continuous and those described as fully continuous. Semi-continuous systems utilize the probability of allele dropout in their computation while fully continuous systems utilize the intensity of the signal directly. Each developer would have made decisions about which models to use, how to compute the likelihood ratio, how to handle the number of contributors’ assertion and whether to model noise or drop-in; thus, there is interest in experimentally confirming that the use or implementation of different computational systems would have negligible impact on inference outcomes eliciting a broader discussion on how to complete such a task. There are generally two ways this may be completed:

Provide the developers with the same data and have them report the results in the form of publications;

Have an independent third party test each system on the same data and report the results in publication form. Both scenarios require the same data be used and that the results be published.

Is it possible that two different software programs could interpret the data differently? If so, why might this happen, and what impact does that have?

Each computational system is an extension of the developers; thus, each will have nuances associated with the algorithms. Understanding the impact of these nuances is of interest to the forensic community and requires a significant amount of data from a variety of DNA sample types of varying complexity. If the nuances have non-negligible impact on inference then laboratories can, perhaps, focus their attention on implementing and validating software that meets their own processing requirements; that is, they can focus on testing speed or end-user experience. If, however, these nuances impact outcomes then it would be justified to investigate the origins of the differences.

The abstract for your workshop mentions mixture interpretation validation. Are there any differences in approach that need to be taken when validating this type of software, and if so, what are they?

If well-designed, the implementation of one system, assay or platform over another should have no bearing on the validation procedure itself. Within this workshop we shall introduce a systems validation approach while comparing it to traditional validation paradigms. Interestingly, the systems validation approach does not require major modifications to the sample types used during validation; rather, it requires only a modification in the way laboratories analyze and report the validation outcomes.

As a statistician, you have a different background than many who use probabilistic genotyping. What advantages does this give you when looking at the data? Do you think it allows you to see and interpret the data differently?

My background is in chemistry and forensic science. My long-run research collaborators are computer engineers, computational biologists, mathematicians, statisticians, biologists and forensic scientists. It is our collective experience that allows us to engineer systems, processes and databases that are of interest to the forensic community. Our aim is to develop systems and processes that are computationally and statistically sound and to test the systems on large sets of experimental data. It is my background in physical chemistry that has inspired me to continuously strive to understand the link between the individual state of each DNA molecule, the bulk properties of the sample and the laboratory process. It is trying to understand this link that drives our research.

Your workshop description mentions using ‘systems thinking’ for DNA mixtures. For those who are unfamiliar with the term, would you be able to describe what systems thinking is?

According to Merriam Webster, a system is a regularly interacting or interdependent group of items forming an integrated whole. Each is delineated by its spatial and temporal boundaries, surrounded by its environment, described by its structure and purpose and expressed in it functioning. Systems Thinking is a set of synergistic analytic skills used to improve the capability of identifying and understanding systems, predicting their behaviors, and devising modification to them in order to produce desired effects (Arnold RD, Wade JP. Procedia Computer Sci. 44 (2015) 669-678).

The forensic DNA laboratory is a system with interconnecting parts wherein each part impacts the outcomes of other parts. For example, one can conceptualize how the number of PCR cycles implemented in the PCR-set-up stage will impact the injection time or voltage used during post-PCR processing, which will then impact the analytical threshold and probability of drop-out and drop-in used during signal analysis and interpretation. Given the DNA laboratory is a complex system with inter-leaving parts, the application of analytic skills to identify, predict and improve behaviors is justified. Taking it a step further, we shall present results that demonstrate that a systems approach improves inference and has the potential to decrease lab-to-lab variability associated with mixture interpretation, regardless of instrument platform, assay type or interpretation tool.

NIST Fellow, John Butler, is currently leading a research team to determine the reliability of DNA profiling methods when used with different types of DNA evidence/mixtures. How do you foresee the results of this study impacting the forensic community?

The NIST scientific foundation review is exploring what is known about DNA mixture interpretation based on published literature and other data available. Hopefully the report that is produced from this study will enable the forensic community and its stakeholders to gain a better appreciation for what DNA results can and cannot achieve given the sensitivity of the techniques in use today. An important goal of the study is to aid understanding of the differences when DNA serves as a biometric (i.e., to assist in identifying someone and address source level questions) versus as circumstantial trace evidence (i.e., where only sub-source level questions can be addressed and greater uncertainty in terms of case relevance can arise due to the possibility of DNA transfer and persistence). An appreciation of these differences reflects that appropriate use of DNA results (as well as other forensic evidence) are dependent on the question trying to be addressed. As part of this workshop, John plans to share insights gained from this study and discuss the report findings.

Can you explain what the PROVEDIt DNA Database is and how you came up with the idea?

PROVEDIt, which stands for Project Research Openness for Validation with Empirical Data, is composed of 25,000 profiles as well as a suite of computational systems developed in a variety of environments by a multi-disciplinary, inter-institutional team.

The collection of software tools includes:

CEESIt: Computational Evaluation of Evidentiary Signal. Outputs the likelihood ratio, likelihood ratio distribution and the probability that the likelihood ratio is greater than one for randomly generated genotypes.

NOCIt: Number of Contributors. Outputs the probability distribution for the number of contributors from which the sample arose.

GGETIt: Genotype Generator & Evaluation Tool. A simulator that outputs the minimum number of contributors based on allele counts and compares it against the known number of contributors.

SEEIt: Simulating Evidentiary Eletropherograms. A dynamic model written in the StellaTM environment that simulates the entire forensic process and produces simulated, well-characterized electropherograms for up to six contributors.

The 25,000 profiles were generated over a four-year period and include 1- to 5- person DNA samples, amplified with targets ranging from 1 to 0.007 ng. In the case of multi-contributor samples, the contributor ratios ranged from equal parts of each contributor to mixtures containing 99 parts of one and 1 part of the other(s). They were generated using 144 distinct laboratory conditions wherein each sample may contain pristine; damaged (i.e., UV-Vis); enzymatically/sonically degraded; and inhibited DNA.

The database is an outgrowth of my collaboration with Drs. Word and Cotton, who expressed interest in developing a database consisting of many mixtures amplified with a multitude of kits. I quickly understood the need for such a centralized and open database and began pipetting immediately.

Some have expressed concern with using Likelihood Ratios in testimony, claiming that jury findings may differ depending on which expert were to testify, as experts may be using different statistical methods to calculate their LR results. How does the PROVEDIt DNA Database help to counteract this?

We can see many distinct uses for the database and believe it can be employed by many groups, such as:

Forensic laboratories interested in the validation of new or existing interpretation protocols, or peak detection software;

In this workshop we shall emphasize that the forensic DNA laboratory is a system wherein many conditions, parameters or software tools can be employed. There are two main reasons why two or more laboratories may report noticeably different Likelihood Ratios:

The information content in the evidence, E, is not similar between laboratories; or

Differences in model choices are impactful.

The PROVEDIt database contains several samples with excellent signal to noise resolution and these would be ideal candidate samples to evaluate the latter. The other samples, however, can be used to demonstrate how one may achieve excellent signal to noise resolution to optimize the laboratory system for improved inference or can be used to demonstrate if the information content in E is, indeed, impactful. In this workshop, we shall demonstrate that the information content in E is controllable and we shall share the way in which we were able to stabilize this across two platforms and amplification kits.

Who would benefit most from your workshop? Are there any pre-requisites?

There are no pre-requisites, though the audience should have an understanding of the basic steps in DNA processing (i.e., extraction, PCR, capillary electrophoresis and peak detection). This workshop would benefit anyone interested in learning how one can systematize the forensic DNA validation process.