2011 Data Guidelines and Presentation

Guidelines for Data Presentation
Objective
• Provide a framework that can be utilized as a tool
for the advancement of standardized data
presentation
Data Guidelines
Experimental Design
Sample Procurement
Sample preparation
Fix/Perm
Which Fluorophore
Controls
Isotype?
Single color
FMO
Instrumentation
Appropriate Lasers
Appropriate Filters
Instrument Settings
Lin vs Log
Time
A, W, H
Interpretation
Mean, Median
%+
CV
SD
Signal/Noise
Gating
Analysis
Presentation
Histogram
Dot Plot
Density Plot
Overlay
Bar Graph
First, Lets address the problem
• Data analysis incorporates many disciplines
including instrumentation, statistics, biology, and
photonics. Often times knowledge in one of the
above is missing
• Many different instruments and software packages
are available.
• Historical precedento Unfortunately there is a large body of work published with poor data and
no clear guidelines
Some Examples of Poor Data Presentation
-Arbitrary and difficult to replicate gate
-On axis data difficult to visualize, interpret, and review
2011 Nature article
Some Examples of Poor Data Presentation
-eBioscience product literature
-Normal Peripheral blood stained with listed reagents
-That’s some bright CD19 and dim CD3
-Ratio between B and T seems off for normal blood
Some Examples of Poor Data Presentation
From Nature Medicine,
1998. Human stem cells
were injected into
NOD/SCID mice and were
reported to reconstitute
multiple lineages.
Myeloid
B Cells
T Cells… CD4 & CD8
Some Examples of Poor Data Presentation
“Medium-to-high FS”?
Did they backgate to ensure this was the correct gate?
Some Examples of Poor Data Presentation
An isotype control for two channels?
Which one? (CD45 was on yet a third
channel; no control for that?)
How was gate actually defined on this
control?
Impossible to estimate the amount of
background staining in this histogram:
need a gate to express it!
Other graphs are shown as bivariate
displays, causing difficulty to translate.
% Pos?
Some Examples of Poor Data Presentation
Why are cells expressing both
markers?
If these are myeloid origin, then
why is a lymphocyte gate (“R1”)
applied?
The cells on the diagonal look like nonspecific staining,
and in fact were probably present in the isotype control.
Some Examples of Poor Data Presentation
Nearly 100% of cells are expressing CD19.
If so, then there is no “room” left over for other
lineages… The data appears self-contradictory. But
without percentages, we cannot tell.
Some Examples of Poor Data Presentation
Same problem as for “myeloid” cells: The CD2+CD3+
cells appear to be non-specifically-stained.
The CD4 and CD8 distributions don’t look like typical
mature T cells… and what about the CD4+CD8+.
Some Examples of Poor Data Presentation
Why do graphs “e” and “h”
have so many events
compared to graphs “d”, “f”,
and “g”? R1 + R2 (2.5%)
represents very few events…
Some Examples of Poor Data Presentation
FITC and PE appear to be over-compensated.
Example 1
An Example of Poor Data
Presentation: Summary
Critical analysis of this figure shows that it does
not support the contentions of the authors.
This does not mean that the authors were
wrong.
Reviewers should have demanded a more
rigorous example dataset… but perhaps the
reviewers were not FACS experts.
Guidelines can educate
Unfortunately, this example is neither unique…
nor even uncommon.
Research Misconduct Inquiry
The Division of Investigative Oversight, Office of
Research Integrity is currently swamped with request
for flow cytometry related Research Misconduct
Inquiries.
Currently a majority of these cases display blatant
intentional fraud. However there is a significant trend
pushing for flow related guidelines, and the onus on
investigators for proper representation of data is
growing.
Research Misconduct
Research Misconduct is defined by law:
42 CFR Sections 50 and 93. Sections 93.103 & 104:
Research misconduct is defined as fabrication,
falsification, or plagiarism … in reporting research
results.
Falsification is manipulating … changing or omitting
data or results such that the research is not
accurately represented in the research record.
Misconduct can be committed intentionally, knowingly,
or recklessly.
There is no wrong way to analyze your data
Meaning- Investigators are free to choose:
• Which plot types for display
• Placement of gates for analysis
• Which statistics
• # events to display or collect
• Which software package to use
• How many times you reanalyze
There is definitely a wrong way to analyze
your data
Meaning- Investigators decisions can lead to incorrect
data generation or interpretation:
• Inappropriate gates for analysis (lymphocyte gate
for CD15 staining, or inconsistent gates)
• Misleading or inconsistent plots for display
• Inappropriate controls (e.g. using isotype for gating)
• Inappropriate number of events collected (too few
events for meaningful and accurate statistical
comparison)
Implementation of Guidelines by J. Exp.
Med.
A set of guidelines for publication of flow cytometry
data has been implemented by the Journal of
Experimental Medicine
All papers submitted for review will be required to
comply with the guidelines, with submission of
supplementary information, in order to be reviewed.
Papers with sophisticated flow cytometric analysis
may undergo an independent review to ensure the
appropriateness of the analysis and presentation.
MIFLowCyt
Minimum Information about a flow Cytometry Experiment
ISAC Recommendation
The fundamental tenet of scientific research is that the
published results of any study have to be open to
independent validation or refutation. The MIFLowCyt
establishes criteria for recording and reporting
information about the flow cytometry experiment
overview, samples, instrumentation, and data analysis. It
promotes consistent annotation of clinical, biological, and
technical issues surrounding a flow cytometry experiment
by specifying the requirements for data content and by
providing a structured framework for capturing
information
Guidelines: Why do we need them?
• A consistent presentation style ensures better
communication of data to readers and listener
• Speaking a common language
• Faster interpretation; understanding nuances
• Provides a level of confidence that the data has been
appropriately generated and analyzed
• Allows reviewers and readers to focus on the point of
the presentation, avoiding distractions from
inappropriate or inconsistent presentations
Guidelines: What they are NOT
They will not define how to do science or how to analyze
and interpret the data.
In most cases, they are not requirements; they simply
codify the “between the lines” information.
They will not prevent nor reduce purposeful fraud.
They can reduce reckless science.
They can reduce confusion and ambiguity within
published data
Introduction
Principles and Guidelines
A few examples of the principles and guidelines for
data presentation follow.
Hardware/Software
Principles and Guidelines
Information about the instrument configuration should
be provided
Why:
Different configurations (laser, filters, etc.) can result in
very different sensitivities, compensation
requirements, etc.
Some experiments (for example, fluorescence
intensity comparisons across different days) require
that the instrument be carefully calibrated.
Interpretation of the significance of the results may
require knowledge of these procedures.
Instrument
• Manufacturer
• Identify the FACS instrument and software
used to collect, compensate and analyze the
data.
• Include Model and Version where more than
one exists.
• Light source
• Type
• Wavelength
• Power
• Optics- Band pass, Long Pass, 530/30
Hardware/Software
Instrument Configuration
Providing instrument configuration is a delicate balance
between providing sufficient information as to be useful vs.
providing too much that is not helpful.
Instrument configuration can be summarized in three
sections:
• Optical
• QA/QC
• Compensation
There is no “right” procedure (but there are “wrong”
procedures for some kinds of experiments). Knowing
instrument configuration is necessary to fully interpret data.
Hardware/Software
Instrument Configuration: Optical
The optical configuration determines what fluorescence
measurements were made by the instrument. There are
two tables: one for lasers, the other for detectors.
Lasers
Number
1
2
3
4
Wavelength
488 nm
532 nm
408 nm
635 nm
Power and Type
15 mW Argon Ion
200 mW Pulsed Diode
25 mW Diode
35 mW HeNe
FACS core facilities can
create these tables and
supply them to users
Detectors
Name
B510
B710
G565
G605
G660
V450
V655
R660
Laser
1
1
2
2
2
3
3
4
Wavelength range
505-515
680-730
565-585
600-620
650-680
420-480
650-680
650-680
Dyes
FITC
PerCP-Cy5.5
PE
TR-PE
Cy5PE, PI
Pacific Blue, Cascade Blue
QD650
APC
Hardware/Software
Instrument Configuration: QA/QC
Knowledge of the QA/QC procedures are necessary to
understand how data analysis was performed.
Do the gates move from experiment to experiment?
Are MFI calculations compared between experiments?
Is sensitivity equivalent across experiments?
Relevant QA/QC procedures can likely be summarized
by a limited set of options that authors select from:
o No daily QC (i.e., fire up the instrument and hope that yesterday's
settings are close enough)
o Alignment using beads: Set the instrument so that the same output
fluorescence is observed on each channel every day
o Set the instrument up to the same voltages and settings each day
(record beads for QA)
o Set the instrument up so that unstained cells are in the first decade of
fluorescence
Hardware/Software
Instrument Configuration: Compensation
A very brief description of how compensation was
accomplished is all that is needed.
•What were the controls? (Beads, cells, combinations)
•Was compensation manual or automatic?
•What software was used to compensate?
•Was manual adjustment of compensation necessary?
This helps reviewers interpret distributions that they may
think are improper compensation.
Graphs-General
Principles and Guidelines
Graph axis labels should include (at a minimum) the
reagent being measured
Why:
Interpretation of the graph is much faster; the reader
does not have to translate each label.
In the case of fluorescent antibodies,
both the specificity and the
fluorochrome should be indicated.
Do not use “FL1” or “P1” as a label.
Fluorescent Reagent Description
•
•
•
•
•
What is binding target
Reporter (Fluorochrome)
Clone name or number
Reagent Manufacturer
Reagent catalogue number
Graphs-General
Principles and Guidelines
The number of events displayed in any graph should
be indicated
Why:
• The number of events making up a display can
impact on the visualization of the display
• The number of events should be considered when
interpreting the precision of the analysis
Graphs-General
Annotating Graphs
Indicate with a simple number within or near each
graphic, or list in the Figure Legend.
Total PBMC
Lymphocytes
Cy5PE: CD45
PhyEry: CD16
63.0%
6296
10000 events
Fluor: CD14
Consistent use of
color helps minimize
extraneous text
ForSc
Axis labels show both the
measurement and the
fluorochrome
Figure 001.01
Scaling or Axis labels
• Show all parts of the plot axis that indicate the scaling
that was used, (Lin, Log, Bi-exponential)
• Numerical values for axis “ticks” an be eliminated except
when necessary to clarify the scaling.
Graphs-General
Principles and Guidelines
To convey quantitative representation of subsets
from graphical displays, a calculated frequency of
gated events must be displayed. The graph itself
cannot convey such information.
Why:
Depending on how many events are displayed, the
appearance of a subset may be quite different. The
only way to assess the frequency with accuracy is to
provide a numerical value.
Histograms can provide notoriously misleading
information about frequencies.
Graphs-General
Graphs Cannot Convey Frequencies
250
100
200
80
Gate
# Cells
% of Max
150
100
60
40
50
20
0
0
0
50
100
ForSc
150
200
0
50
100
ForSc
150
200
Two datasets. What is the representation of “large”
(high forward-scatter) cells? Does the “red”
distribution have more?
Figure 001.04
Graphs-General
Graphs Cannot Convey Frequencies
250
Events:
4,922
4,922
# Cells
200
150
100
50
0
0
50
100
ForSc
150
200
Blue
Which distribution has more cells?
Red
Figure 001.04
Intensity measurement
Explicitly define the statistic applied (mean, median, Geo
mean
Graphs-General
Principles and Guidelines
The choice of smoothing and specific display type is
up to the author. Choose whichever graph and
display options most readily convey the information
needed to interpret the experiments, but be
consistent across all graphs within an analysis
Why:
There is no single “best” way to display data. Each
display type has advantages and disadvantages.
However, using different displays in different graphs
may mislead readers because of the nuances of
emphasis by each graph type.
Gating
Principles and Guidelines
Whenever gated analyses are performed, an
illustration of the gating process should be shown.
Why:
The way in which cells are gated can dramatically
impact the analysis and interpretation, particularly
when rare populations are involved.
Backgating demonstrates how each gate has
impacted the analysis, and can demonstrate that the
gating process has not artefactually selected for the
subsets being analyzed.
The gating “tree” teaches readers how to analyze
data when they do similar experiments.
Gating
Principles and Guidelines
Unless otherwise explicitly stated, gating is assumed
to have been performed subjectively
Why:
By convention.
Gating
Principles and Guidelines
The use of control samples to set gates should be
shown; the algorithm to place gates should be
explicitly defined if it was not subjective
Why:
In many cases, subjective placement of gates is a
reasonable way to analyze the data; interpretation will
not be affected by minor relocations of the gate.
However, some types of analysis require rigorous
placement of gates to provide the most significant
data. If gate placement was algorithmic, then it must
be described and shown.
Gating
Gate Placement Algorithms
Purely subjective
Illustration is always useful. Unlikely to be acceptable for
quantitative fluorescence measurements, identification of
dimly-expressing subsets; discrimination between
overlapping subsets.
Based on control stains (unstained, FMO, etc.)
The control sample must be shown, along with a description of
how it was used to place the gate. If the gates move for different
types of samples (e.g., treated vs. untreated), then at least one
example of each should be given.
Objective algorithm.
Detail the algorithm (e.g., “Top 2% of events”; “Autogate defined
by software”).
Experimental and Sample
Information
• How were cell suspensions prepared
o
o
o
o
Specific proteases
Filtration
Lysing agents
Fix/Perm reagents
Implementation of Guidelines by J. Exp.
Med.
In addition to ensuring that primary data presentation
conforms with the guidelines, authors will also be
expected to submit a single additional supplementary
section devoted to the flow cytometry.
This section will include:
• Table of instrument information (template
provided online)
• Gating tree example(s)
• Gating control(s)
• Additional analyses pertinent to the
interpretation of the flow cytometric data
References
Prefetto et al 2006 JIM
Keeney et al 1998 Cytometry
Cytometry 30(5), 1997
MIFLowCyt 1.0
http://ucflow.blogspot.com/2011/04/displaytransformation-and-flowjo.html (bi-exponential display)
Cytometry A 783A:384-385
Seventeen-colour flow cytometry: unravelling the immune
system
Stephen P. Perfetto, Pratip K. Chattopadhyay & Mario
Roederer
Nature Reviews Immunology 4, 648-655 (August 2004)