Genomatix: Expression Analysis for RNASeq Data (only available on GGA)

This task analyzes tags from RNA-Seq experiments and
calculates expression values for all transcripts/loci available in ElDorado.

If two data sets are supplied (e.g. treatment vs. control, or condition A vs. condition B,
or tissue A vs. tissue B), the task can perform a differential analysis, i.e.
calculate lists of up- and down-regulated transcripts/loci between the data sets.

If replicates for treatment and control data are available, the user can select from different
methods like 'DESeq2', 'DESeq' or 'edgeR' to calculate the differential expression.

Optionally, the input reads can also be classified by their association with features from
the ElDorado genome annotation (read statistics for exon, intron, partial, promoter, intergenic region).

Definition: There are the two options for this task (for details see below):

"locus-based":
the exons of all transcripts with one GeneId within a Genomatix locus are taken together and this "gene body" is used for counting reads

"transcript-based": all transcripts are considered separately when counting reads in exons

To keep the following descriptions as simple as possible, we use the terms
transcript/locus or
transcripts/loci to denote the two types of regions
used for the RNA analysis, depending on the parameter setting.

Analysis of one data set

The reads in the input data set are analyzed, and for each transcript/locus in the respective genome
the RPKM value (reads per kilobase transcript per million reads, Mortazavi et al., 2008) and additionally
a normalized expression value (NE) is calculated from the read distribution.
The NE-value is based on the number of reads located in the exons of the transcript/locus
and is normalized to the length of the transcript/locus and the density of the data set.

where NE is the normalized expression or enrichment value,
#readsregion: the reads (sum of base pairs) of falling into either the transcript or the cluster region,
#readsmapped: all mapped reads (in base pairs),
lengthregion: the transcript or cluster length in base pairs
and c a normalization constant set to 107.

NE-values are provided for the whole transcript/locus and for the most and least expressed exon
of the transcript/locus. The results are summarized in a statistical overview.
Additionally a list with the GeneIds of the genes with the highest expression values can be downloaded or directly be used within the Genomatix Pathway System.
If replicates for one condition are provided, the analysis is done separately for each input
file. This allows comparison of the expression across replicates.

If the expression values for two different conditions (here called "treatment" and "control" for simplicity)
are to be compared, the following statistical testing methods for evaluating differential expression are available:

While the Audic-Claverie-method does not handle replicates, 'DESeq2', 'DESeq' and 'edgeR' were developed specifically for
replicate data. Moreover, edgeR cannot be used if there are no replicates available.

Audic and Claverie introduced a formula to compute a conditional probability for observing N reads (treatment)
in a class given that M reads were observed before (control).
These p-values, in combination with the Genomatix normalized expression (NE) value
are used to evaluate differential expression.

The 'DESeq2', 'DESeq' and 'edgeR' methods model count data (here the number of reads from an RNA-Seq experiment mapped to a transcript) by a
negative binomial distribution. The parameters of the distribution (mean and dispersion) are estimated from the data,
i.e. from the read counts in the input files. Each method computes a measure of read abundance, i.e. expression level
(called 'base mean' or 'mean of normalized counts' in DESeq/DESeq2, and 'concentration' or 'counts-per-million' in edgeR) for each transcript and apply a
hypothesis test to each transcript to evaluate differential expression. In particular, the three methods determine a (adjusted) p-value and a log2 fold change
(in expression level) for each transcript.

One parameter can be set for DESeq: the dispersion estimates are found by fitting a curve through the per-transcript dispersion estimates. The way this
fitting is done can be specified to be either 'parametric' (the default in DESeq) or 'local'.
Default settings are used for the other parameters, in particular single pooled values are used as empirical dispersion estimates, and the maximum of the empirical
and fitted values is used the dispersion for a transcript resp. cluster. If there are no replicates, the settings are changed to the 'blind' method for computing the empirical
dispersion estimates, and the fitted dispersion values are used. Sometimes, the parametric fitting fails, and in this case, the analysis should be
rerun with the 'fitting method' set to 'local'. For details please refer to the DESeq vignette.

For DESeq2, two parameters are settable: The testing for differential expression can either be done with a Wald test or a Likelihood-ratio test.
The former is the default testing method in DESeq2, while the latter is the one in use for DESeq. The other settable parameter is - as for DESeq - the
fitting method used in dispersion estimation. See the DESeq2 vignette for details.

edgeR normalizes the count data using the TMM (trimmed mean of M-values) method introduced by Robinson and Oshlack. All parameters used in the edgeR
algorithm as set to their respective default values. In particular, tagwise (i.e. per-transcript) dispersion estimation is used, with the tagwise dispersions squeezed towards the
common dispersion, as described in the edgeR vignette.

Before the analysis, any transcripts without any mapped reads are removed from the dataset, i.e. from the input file
91.input_replicate_analysis, all transcripts are removed, which have a read count of '0' in all samples. In the output file
92.output_replicate_analysis, these transcripts are listed with the value 'NA' in each output column except the 'id' (p-value, fold-change etc.).

For defining up- and down-regulated transcripts between two conditions or samples,
the following criteria are used (parameters set by the user):

a minimum log2 fold change in expression level between the transcript in set1 ("treatment") versus set2 ("control").
The measure of expression level, or read abundance, varies by the chosen analysis method.

a significance level used as a cut-off for the adjusted p-value computed by the chosen method
Each of the methods applies the technique for multiple testing correction (FDR control) introduced by Benjamini and Hochberg in

Note that the first input set is regarded as "treatment", whereas the second input file is used as "control",
i.e. "up-regulation" refers to a higher expression in set1 than in set2.
Also note that the direction of up- and down-regulation will change if the two data sets are
exchanged in the input.

To calculate the list of up-regulated genes, all up-regulated alternative transcripts of a gene
are used to calculate a mean log2 fold change in expression level. The gene list containing GeneId, Symbol and mean log2 fold change
is then sorted by the highest log2 fold change. The top 50 genes are displayed in
the output, the complete list can be downloaded and can be used as input data e.g. for the Genomatix Pathway System.
The list of down-regulated genes is calculated correspondingly, using all down-regulated alternative transcripts of a gene.
The program also gives the list of up- and down-regulated genes,
i.e. those genes where some alternative transcripts are up-regulated and some others are down-regulated at the same time.See more details in the program output section below.

Input data are accepted in
BED / bigBed file format or
BAM file format containing the input regions.
For some tasks BAM support might not be available.
The maximum amount of input regions and their maximum length can differ for the various tasks.
The limits are usually shown on top of the input pages.

Within this section you can either

choose from previously uploaded BED/BAM files

or add a new BED or BAM file to the list (by clicking "Add BED/BAM file...")

For those tasks that allow to choose replicate data as input, you can use shift/ctrl-keys to select multiple files
from the list. All selected files will then be treated as replicates.

When adding a new file, a new window will open, asking you to either

upload one or several BED/BAM files from your local computer

or import one or several BED/BAM files from the GMS (see more details)

or import one or several BED/BAM files from the GGA (see more details)

For the new BED/BAM files, you will have to select the correct organism, as the
organism and the genome build are associated with the BED file for future use
(the default is your latest choice in the current session).
Note that files critically depend on the underlying genome build,
which can be changed by selecting a different ElDorado version on the top right of the page
before uploading a file.
You can see the list of genomes available in ElDorado.

Note that almost all browsers have a general upload limit of 2 GB,
i.e. files bigger than this size should be zipped before uploading from your local computer.
This restriction does not apply when using the direct import from the GGA/GMS.

Optionally you can specify a name for saving uploaded files on the server,
otherwise the name of the uploaded file will be used.
If several files are uploaded, the string given here will be used as prefix for each file name.

If any of the regions in the input file cannot be completely assigned to the selected genome
(e.g. wrong chromosome numbering or wrong positions within a chromosome),
an error message will appear and the regions will be skipped. If no valid region is found in an uploaded file,
the complete file will be skipped.

After one or several BED/BAM files were uploaded successfully, and after closing the popup window,
the list of available BED/BAM files will be automatically updated.

Uploaded BED or BAM files can be deleted from the project anytime via the
project management.

Optional
control file(s) for differential analysis

If additional input data is available
(e.g. data from a different condition or tissue, here called "control" data),
it can be selected or uploaded here. After the tickbox is checked, an additional selection will appear
(same options as for the "treatment" file(s), see above).
If several BED/BAM files are selected within the scrolling box, they are treated as replicates for the "control" condition.

Differential Analysis Parameters

The differential analysis parameter section will only appear,
if at least one control file was uploaded in the section above.
There are four available algorithms for calculating the differential expression/enrichment values:

DESeq (recommended for replicate data, but does work on non-replicates, too)
It is possible to select the 'fitting type' parameter for DESeq, i.e. the way how the curve is fitted through the
dispersion estimates. For details on the meaning of this parameter please refer to the DESeq vignette.

DESeq2 (recommended for replicate data, but does work on non-replicates, too)
As for DESeq, it is possible to select the 'fitting type' parameter for DESeq2, i.e. the way how the curve is fitted through the
dispersion estimates.
Additionally, DESeq2 offer two alternative methods for testing for differential expression: Wald test and Likelihood-ratio test
(with Wald test being the default).
For details on the meaning of the parameters please refer to the DESeq2 vignette.

edgeR (recommended for replicate data, does not work on non-replicates)

The thresholds that define a transcript as differentially expressed (or a region as enriched/depleted) can be set here.
There are two criteria, that are combined (both must be satisfied for differential expression/enrichment):

an adjusted p-value threshold for the significance of observing the detected change

Note that the p-values calculated by the different methods (DESeq/DESeq2, edgeR, Audic-Claverie) can differ.

Also note, that setting the p-value to 1 allows skipping of this criterium.

a threshold for the log2 fold change of expression/enrichment level
A log2 ratio of 1 is a fold change of 2; a log2 ratio of 0.585 is a fold change of 1.5;
e.g. if the log2 fold change of expression/enrichment is set to ≥ 1, the expression values must go up
by at least 100% to appear in the differentially expressed transcripts/enriched regions list.

The log2 fold change thresholds can be set separately for up- and down-regulation (enrichment/depletion).

Note, that by setting the log2 fold change thresholds to 0, fold changes are ignored in the analysis.

The expression analysis can be based on different units of underlying data:

Locus-based expression analysis:
The exons of all transcripts with the same GeneID within a Genomatix locus are taken
together and this "gene body" is used for counting reads
(i.e. reads in overlapping exons of transcripts within the same locus are counted once)

Transcript-based expression analysis:
All transcripts are considered separately when counting reads in exons
(and reads within overlapping transcripts/exons might be counted several times)

If the transcript-based expression analysis is checked, the transcripts used for expression analysis
can additionally be constrained by their source (e.g. NCBI RefSeq).
By default, all non-redundant transcripts available in ElDorado are used.
Depending on the organism, several transcript sources are available.
For example, human and mouse transcripts are available from

NCBI RefSeq

Ensembl

NCBI GenBank

For plants, additional sources may be available (e.g. Phytozome for Glycine max).

Read Classification

When checked, a read classification is done for each input file from the input data:
The number of input reads overlapping genomic elements like exons, introns,
promoters and intergenic regions will be given in the result.

Strand Specificity

Check this box if the sequencing experiment was done in a strand specific manner
(depending on library preparation).

Show result directly in browser window
In this option the URL of the result is directly shown in your browser
window.

Warning: Please use this option
only for analyses which can be performed in a short time.
If the analysis takes longer than the timeout of the webserver, the
connection will be terminated and you will receive an error message
(e.g. "The document contained no data."). In this case, the results will
not be available, please restart the analysis using the option
below "Send the URL of the result to".

Send the URL of the result via email
In this option an email with the URL of the results will be sent
to the user provided email address, when the analysis is finished.

The results will be available for a limited time on our server.
For details of how long your results will be kept please see the result-email.
After that period they will be deleted unless protected in the project management!

For the differential expression analysis, a comparison of the expression values of the two input data sets
(treatment versus control, possibly each with replicate data) is done.
First the comparison is done on transcript/locus level by the selected method (DESeq, edgeR, or Audic/Claverie),
i.e. each transcript/locus is checked, whether it fulfills the user-defined thresholds regarding

Only for the transcript-based analysis option:
In a second step, the analysis is done on gene level (here, a gene denotes a locus with a corresponding GeneId).
To calculate the list of up-regulated genes, all up-regulated alternative transcripts of a gene
are used to calculate a mean log2 fold change of expression level. The resulting gene list containing GeneId, Symbol and
the mean log2 fold change is then sorted by the highest log2 fold change and the top genes are displayed in the output
(see below).

Note that the log2 fold change values cannot be calculated under certain conditions (e.g. if no expression is detected for a
transcript/locus in the control set). Such cases are indicated by a "-Inf", "Inf" or "NA" value in the output.

number of both up- and down-regulated genes,
i.e. those genes where some alternative transcripts are up-regulated and some others are down-regulated at the same time
(only available for transcript-based analysis)

The download links below the numbers allow accessing

tab-separated data files (suffix *.tsv),
containing details like trancript or gene ID,
position on the genome, p-value, log2 fold change for each transcript/locus or gene respectively

Only for the transcript-based analysis option:
To calculate the list of up-regulated genes, all up-regulated alternative transcripts of a gene
are used to calculate a mean log2 fold change of expression level. The resulting gene list containing GeneId, Symbol and
the mean log2 fold change is then sorted by the highest log2 fold change.
The top 50 up-regulated genes are displayed in the output;
the file with details can be downloaded from the overview table (see above).
Note, that not all alternative transcripts of a gene are necessarily regulated in the same direction,
thus the total number of alternative transcripts for a gene and the number of up-regulated transcripts for a gene
can differ. Here, the mean log2 fold change refers to all up-regulated alternative transcripts of a certain gene.

The list of down-regulated genes is calculated as for the up-regulated genes, just using all down-regulated alternative
transcripts of a gene and sorting by the lowest mean log2 fold change in expression..

Any subset of the up- and down-regulated gene lists can be used to start Genomatix Pathway System (GePS).
In GePS, the log2 values will be used for coloring the genes in the pathways.
If the expression analysis was not done in the human system, an additional option allows to
use the orthologous genes in human. This allows a transfer to the human canonical Signal Transduction Pathways available in GePS.

Graphics in PNG-format with a scatter plot of fold-change in expression in treatment versus control (y-axis) against
expression level (x-axis). For edgeR, the measure of expression is the concentration, for DESeq and DESeq2, it is the mean of normalized counts, for
Audic-Claverie the Genomatix NE-value is used. Each data point depicts a transcript, those showing differential
expression (adjusted p-value below significance level) are colored in red. If you specified thresholds for the fold-change, they are shown in the plot as blue dashed lines.

Graphics in PNG-format with a volcano plot of adjusted p-value (y-axis) against fold-change in expression
in treatment versus control (x-axis). For edgeR, the measure of expression is the average counts-per-million,
for DESeq and DESeq2, it is the mean of normalized counts, for Audic-Claverie the Genomatix NE-value is used.
Each data point depicts a transcript.
If you specified thresholds for the fold-change, they are shown in the plot as blue dashed lines.
To avoid problems with the plotting routine, p-values < 1e-311 are omitted from the volcano plot.

Graphics in PNG-format showing the biological coefficients of variation (BCV) against read abundance (counts-per-million).
The BCV is defined as the square root of the dispersion estimated by edgeR. The red line shows the common BCV towards
which the transcript-wise estimates (black dots) are 'squeezed' by the edgeR algorithm.

In case the read statistics option has been checked,
statistics regarding the distribution of the reads for each of the input data sets is given.

Here, each input read is classified either as

intergenic or

exon or

intron or

partial (i.e. overlapping with an exon)

Additionally, a region can be classified as promoter.

A graphical representation shows the enrichment of reads in the classes above
compared to the genomic background (tab "Enrichment")
and the distribution as a pie chart in comparison with the genomic background (tab "General").

A table lists the number of reads overlapping the genomic classes.

Additionally, a table with the distribution of the reads on the different
chromosomes of the genome is available. The content of this table is hidden by
default, but can be shown by clicking the "Show details" link in the header.

If a detailed read classification is desired, this can be done for the input files
with the task Annotation & Statistics.

A table with an overview of the number of loci and transcripts found to be expressed is shown.
Additionally, the minimum, average and maximum Normalized Expression values are given.
(The NE-value is based on the number of reads located in the exons of the transcript and is normalized to the
length of the transcript and the density of the data set (for details see above). For loci, the maximum NE value of all transcripts
within the locus is used.)

Details on NE value distribution

The histogram shows how many transcripts are expressed with a specific
intensity. The histogram displays 50 classes of NE values. Note that the last class sums up
all NE values larger than 1.
This section is hidden by default and can be shown by clicking on the >>>show details<<< link.

Additionally, a BED file with a region for each transcript annotated in ElDorado with the NE-value as score can be
downloaded or saved directly into
the Genomatix Suite project management, to be available for further analysis with other tasks.

Expression Profile for Genes

Here, the 5 genes with the highest expression values are displayed
(the list can be extended to 50).
The complete gene list can be downloaded; it contains the GeneIds and Gene symbol
of the genes together
with the number of transcripts for this gene and their normalized expression values
(NE value and RPKM value of the highest expressed transcript
as well as the mean NE and mean RPKM values for all transcripts of the gene).
Additionally, the Genomatix Pathway System
can be started directly from here, allowing the user to select the number
of genes to use as input.

All resulting data and graphics files can be downloaded as an archive (tar-file).
The expression analysis results for each input data set
can be found in the sub-directories called either
"sample_<nr>" or "ctrl_<nr>".

Also, some additional data files are available,
if one of the test methods 'edgeR' and 'DESeq' was selected

Here is an overview of results and their corresponding file names
(for format details see below):

Data files for all analyzed transcripts/loci and for differentially expressed transcripts/loci

The differentially expressed transcripts/loci are the subset of all transcripts/loci, containing only those transcripts/loci
that fullfil the user-defined criteria for the (adjusted) p-value and log2 fold change threshold.
Both the tab-separated file for all analyzed transcripts/loci (default name "10.transcript_summary.tsv"
or "10.loci_summary.tsv", respectively)
and the data file for the differentially expressed transcripts/loci ("11.diff_expressed_transcripts.tsv" or "11.diff_expressed_loci.tsv")
contain the following information.

Note: columns marked with a (*) are only available with the transcript-based analysis option

1:(*) transcript ID (Eldorado)
2:(*) accession number of the transcript (external e.g. RefSeq, Genbank, Ensembl)
3:(*) locus ID (Eldorado)
4: symbol of the gene
5: gene ID (NCBI Entrez Gene, 0 if not available, -2 if ambiguous)
6: contig/chromosome accession number
7: chromosome
8: strand
9: start position of the transcript/locus
10: end position of the transcript/locus (start < end)
11: length of the transcript/locus (sum of exons)
12: number of exons
13: p-value (depends on the selected method)
14: adjusted p-value(depends on the selected method)
15: log2(fold change), logarithmic (base 2) fold-change in read abundance/expression level in treatment
over control (> 0 is enrichment in treatment, < 0 is decrease in treatment).
The computation of the log2(fold change) value depends on the selected method:
Audic-Claverie: log2 (NE value of treatment data set / NE value of control data set)
DESeq/DESeq2: based on base mean values of treatment and control
edgeR: based on concentration values of treatment and control
Note, that the log2(fold change) value can be -Inf/+Inf if one of the conditions shows no expression
16: Regulation of treatment (set1) compared to control (set2), (values can be "up", "down", "no")
the following columns depend on the number of input files:
- number of reads for each replicate from the treatment sets and the control sets
- normalized expression value for each replicate from the treatment sets and the control sets
- the mean normalized expression value across the treatment replicates
- the standard deviation of the normalized expression values across the treatment replicates
- the mean normalized expression value across the control replicates
- the standard deviation of the normalized expression values across the control replicates
- RPKM for each replicate from the treatment sets and the control sets
- the mean RPKM across the treatment replicates
- the standard deviation of the RPKM values across the treatment replicates
- the mean RPKM value across the control replicates
- the standard deviation of the RPKM values across the control replicates

The data files for the up-/down-regulated genes (default names "13.diff_expressed_genes_up.tsv" and
"14.diff_expressed_genes_down.tsv") contain data allowing to assess variance of expression within the alternative transcripts
of the gene. The following columns are available (tab-separated format):

Note: columns marked with a (*) are only available in with the transcript-based analysis option

1: gene ID (NCBI Entrez Gene)
2: symbol of the gene
3:(*) number of alternative transcripts for this gene that are up-/down-regulated
4:(*) total number of alternative transcripts available in the Genomatix annotation for this gene
5: (mean)(*) log2 fold change of up-/down-regulated transcripts/loci
6:(*) min log2 fold change of up-/down-regulated transcripts/loci
7:(*) max log2 fold change of up-/down-regulated transcripts/loci
8:(*) standard deviation across the log2 fold change values of the regulated alternative transcripts
9: (minimum)(*) p-value for the regulated transcripts/loci
10: mean NE(treat.reg.): mean normalized expression value for the regulated transcripts/loci in the
treatment data
11: stddev NE(treat.reg.): standard deviation across the NE values for the regulated transcripts/loci in the
treatment data
12: mean NE(ctrl.reg.): mean normalized expression value for the regulated transcripts/loci in the
control data
13: stddev NE(ctrl.reg.): standard deviation across the NE values for the regulated transcripts/loci in the
control data
14: mean RPKM(treat.reg.): mean RPKM value for the regulated transcripts/loci in the treatment data
15: stddev RPKM(treat.reg.): standard deviation across the RPKM values for the regulated transcripts/loci in the
treatment data
16: mean RPKM(ctrl.reg.): mean RPKM value for the regulated transcripts/loci in the control data
17: stddev RPKM(ctrl.reg.): standard deviation across the RPKM values for the regulated transcripts/loci in the
control data

The tab-separated file for the up- and down-regulated genes
(default name "12.diff_expressed_genes_up_and_down.tsv") contains the following information.

1: gene Id (NCBI Entrez Gene)
2: symbol of the gene
3: total number of alternative transcripts for this gene
4: number of up-regulated transcripts for this gene
5: mean log2 fold change of up-regulated transcripts
6: number of down-regulated transcripts for this gene
7: mean log2 fold change of down-regulated transcripts

The data files for all genes (default names "15.genes_summary.tsv")
contains data allowing to assess variance of expression within the alternative transcripts
of all genes (this is mainly for the transcript-based analysis).
The following columns are available (tab-separated format):

Note: columns marked with a (*) are only available in with the transcript-based analysis option

1: gene Id (NCBI Entrez Gene)
2: symbol of the gene
3:(*) total number of alternative transcripts for this gene
4: mean normalized expression in all transcripts/loci in treatment file(s)
please note, that here the mean NE is calculated across ALL transcripts/loci of the gene across ALL replicates
5: standard deviation of normalized expression of all transcripts/loci in treatment (possibly replicates)
6: mean NE of all transcripts/loci in control file(s)
7: standard deviation of normalized expression of all transcripts/loci in control (possibly replicates)
8: mean RPKM in all transcripts/loci in treatment file(s)
please note, that here the mean RPKM is calculated across ALL transcripts/loci of the gene across ALL replicates
9: standard deviation of RPKM of all transcripts/loci in treatment (possibly replicates)
10: mean RPKM all transcripts/loci in control file(s)
11: standard deviation of RPKM of all transcripts/loci in control (possibly replicates)

Data files if one of the test methods 'edgeR', 'DESeq' or 'DESeq2' was selected

91.input_replicate_analysis
containing the count data that was used as input for the test method

91.input_replicate_analysis.libsize
containing the library size, i.e. the read numbers of the various input files

92.output_replicate_analysis
contains values computed from the R-script, depending on the chosen method for testing differential expression (DESeq, DESeq2 or edgeR).
The input transcripts are filtered such that any transcripts with a read count total of 0 (i.e. no reads in any sample) are omitted
from the analysis. These transcripts are listed in the the output file with all values (except the 'id' column) set to 'NA'.

1. id: Genomatix Transcript Id
2. p-value: p-value resulting from the hypothesis test of the selected test method for differential
expression (DESeq, DESeq2 or edgeR; for DESeq2, the p-values are either from a Wald or LRT test)
3. adj. p-value: Benjamini-Hochberg adjusted p-value (from column 2)
4. log2FoldChange: logarithmic (base 2) fold-change in read abundance/expression level in treatment
over control (> 0 is enrichment in treatment, < 0 is decrease in treatment); for DESeq and DESeq2, the
fold-change computation is based on the base mean, for edgeR on the concentration
for DESeq the remaining columns are
5. baseMean: mean expression level across all replicates, treatment and control
6. baseMean control: mean expression level within control group
7. baseMean treatment: mean expression level within treatment group
8. variance ratio control: ratio of the estimate of the base variance of the counts for the control
group and the value predicted with the base variance function; according to the package authors,
a large value may indicate a false hit;
see the R-vignette for details (within the vignette, this value is referred to as 'resVarA')
9. variance ratio treatment: as column 8, just for the treatment group; within the DESeq vignette,
this value is referred to as 'resVarB'
for DESeq2 the remaining columns are
5. baseMean: mean expression level across all replicates, treatment and control
6. standard error
7. Wald/LRT statistic
for edgeR the remaining column is
5. log counts-per-million: expression level, logarithmic (base 2)