Data
management (low density microarrays)

Raw
data: Initially two fluorescence
intensity values for each gene are obtained from two replica spots
printed on the microarray (example).
The deviation
of the two replica spot measurements should be very small. Otherwise
variation of the two spot measurements indicates technical problems
and demands visual inspection of spot intensity and quality.

Raw
data aquisition: Each microarray is scanned six times at
increasing photomultiplier (PMT) settings to account for the
differences in fluorescence signals that are obtained by
fluorophore-cRNA/oligonucleotide probe hybridization. Specifically,
abundant mRNAs give strong signals at low scan intensity but
saturated non-linear signals at higher scan intensity, while low
abundant mRNAs may only be detected at maximal scan intensity. This
generates six image files (the original scans) that are stored as
.TIFF files. Six times two (for the replica spot) numerical data are
obtained for each gene and are integrated into a single intensity
value (called I value) for each gene by the MAVI software which was
developed at MWG Biotech (for an example see here: pdf, 451kB).

Normalization:
Since the number of genes on the Inflammation
array is small and most of them are highly regulated, `classical`
normalization between two samples using the total fluorescence
intensity derived from all gene expression measurements is not
possible.

We
use a number of house keeping genes (view
list) that are detected over a range of different signal
intensities to calculate an average intensity for the house keeping
genes for each individual array (example).

The
average house keeping gene intensity is then used to calculate a
relative signal intensity for each inflammatory gene called IPC value
(example). This
relative
intensity value enables comparisons across different experiments.To
obtain the IPC value, first the logarithmic mean intensity of all
house keeping genes is calculated and then single intensity values of
genes are expressed in percent of the calculated mean divided by 100.

Lower
detection level: We use arabidopsis
oligo nucleotide probes to determine background hybridization
(example). We
also use the
average signal for any inflammatory gene across a large number of
different samples. Signal values that lie at least two SD higher than
this average are an indicator of relevant (basal) expression of this
gene (example).

Ratio:
We routinely label all cRNAs with Cy3 and
hybridize each sample on a single microarray. Samples derived from
the same biological experiment form a group (e.g. control,
treatment 1, treatment 2,….). We name the control always (B)
and any treatment (S1) to (Sn).
Dividing the gene expression value (I or IPC) from a particular
sample of one group by values derived from another sample of that
group results in a measurement for the relative gene expression,
which is called ratio of gene expression. For standard ratio
comparisons (e.g. (S1)/(B), (S2)/(B) etc.) we have created several
Excel macros (example).

Data
derived from pairwise comparisons can be depicted as bar graphs
using a sigma plot macro. The sigma plot graph also contains the
intensity values for each gene under investigation. To easily
navigate through the data genes are ordered into functional groups
(example).

CytoBASE:
Registered users can view and analyze all their experiments as
well as a set of `publically accessible` experiments using .

Customized
analysis: Very often data need to be
extracted and presented for presentations or publications (e.g. as a
table or Sigma Plot graph). We have
used a number of different options, e.g. formatted tables and bar
graphs to fulfill these needs. According to the demands of the
microarray projects we offer any help derived from our own experience
to further analyze data in more detail.

All users obtain a
"standard result file" which is sent via email and can also be
retrieved from CytoBASE. For an example see here.
BASE always contains the most
recent version of a results file.

Data management (high density microarrays)

DNA oligonucleotide microarray platform and raw data acquisition and storageThe microarray laboratory is equipped with an Agilent scanner
G2565C with a maximum resolution of 2µm which allows automated scanning
of up to 48 slides using arrays carrying up to 1 Million
oligonucleotide probes per slide. Reverse transcription-,
amplification-, fluorophore labeling-, cRNA fragmentation- and
hybridization-procedures are all performed using Agilent chemistry and
reagents and highly standardized work flows. However, the lab has also
experience with a number of other cDNA synthesis-, amplification-, and
labeling protocols. In order to maintain a high level of flexibility
and to ensure highest reliability of output data, many quality control
(QC) routines have been introduced in the course of sample processing
(e.g. assessment of absolute and relative quality of Input-RNA samples;
comparison of yields, fluorescence incorporation rates and fragment
lengths of labeled cRNA samples). This comprehensive monitoring
provides the opportunity to selectively repeat single reaction steps of
“outlier samples” within experimental series prior to the final (most
expensive) microarray hybridization step. After microarray scanning,
resulting Tiff-Images are subjected to raw data extraction procedures
using Feature Extraction (FE) software (V10.7), largely utilizing
recommended default protocols. All relevant information pertaining to
the current study (e.g. regarding processed sample characteristics,
utilized array batches, adverse effects, ...) are routinely documented
in standardized excel formats and will finally be attached to CytoBASE
along with the microarray results.
DNA microarray data analysis
Microarray data analysis can be divided into three major steps:
Step1: Quality control and data transformation procedures
This first part of analysis pertains to the overall technical quality
of the microarray data. Local impairments in hybridization performance
are queried, documented and eventually marked by manual flagging of
affected regions. QC reports, generated by use of FE software
algorithms, capturing many different QC-relevant parameters (e.g.
“behavior” of exogenous spike-in transcripts, number and local
distribution of different outlier spots, overall signal intensity
distribution, …) were thoroughly inspected for every microarray
hybridization. In case that a particular microarray data set turns out
to be of too low quality, an immediate repetition of the respective
hybridization will be initiated. Furthermore, it is checked now, if
biological positive- or negative controls behave as expected and if
consistently altered signal intensity levels of genes, overexpressed,
deleted or knocked-down, can be used to univocally proof correct sample
identity and a faultless performance of the underlying biological
experiment in retrograde. Next, optimal data normalization and
transformation procedures are established in a study-specific manner.
For single-color mRNA expression microarrays a linear scaling approach
(based on the 75th percentiles of each array´s intensity distribution)
combined with the introduction of appropriate surrogate values for
unreliably low intensity measurements is used by default. However,
different data transformation strategies could become necessary in
particular cases. Finally, principal component analysis (PCA) is
performed to identify “outlier data sets” and to relate the degree of
intra-class variability to the extent of inter-class variability (an
important criterion to judge if the assessed number of replicates per
class is sufficient for “in depth analyses”).
Step 2: Standardizable and initial biological data analysis
After FE-mediated data extraction, followed by QC and data
transformation procedures, microarray data are further processed by use
of excel macros and R-Scripts. These tools have been developed in the
microarray lab to specifically pre-process Agilent microarray data.
They are required i) to convert processed raw data into neatly arranged
excel forms, ii) to reduce complexicity of data by selecting only the
most informative part of data initially acquired per gene, iii) to
incorporate adequate sorting keys and supplemental information to
facilitate navigation through the complex data, and finally iv) to
introduce meaningful ratios of relative gene expression in case of
single-color microarray experiments. In summary, data are routinely
converted into an excel format (hereafter referred to as “Standardized
Data Extract” file or “SDE”), that enables even less-experienced
scientists to get a first impression of the most prominent gene
expression changes and the overall data reliability within the
experimental system under investigation. In addition to the SDE,
supplemental excel tools and data visualization formats are generated
and will be further improved in the course of this project.
Accordingly, explanatory files and manuals will be provided. Along with
standardized data files, a report summarizing the most important
aspects of the microarray data, analyzed so far, will be routinely
generated and provided. All of these files will be deposited in
CytoBASE.
Step 3: Individualized and advanced biological data analyses
According to the outcome of the Microarray analysis part 2 (see above),
the study will then be open for individualized “in-depth-analyses”. Due
to the diversity of questions to be addressed , or analysis steps to be
considered at this stage, it is no longer possible from here on to
apply generalized analysis routes. In fact, a huge amount of additional
data analysis or visualization opportunities are possible now. Examples
are the application of sophisticated filtering strategies, clustering
approaches, gene ontology analyses, gene set enrichment analyses,
significance tests, or, otherwise, the generation of heatmaps, figures,
tables or text sections for publications. Furthermore, different kinds
of pathway analyses (e.g. the superimposition of data on pre-determined
pathway maps) could be highly instructive regarding the progression of
an ongoing study.