Arguments

list, an EList object generated using the
voom function.
Entrez Gene IDs should be used as row names.

contrasts

double, an N x L matrix indicates the contrast of the
linear model coefficients for
which the test is required. N is number of experimental conditions and L is
number of contrasts.

logFC

double, an K x L matrix indicates the log2 fold change of each
gene for each contrast.
K is the number of genes included in the analysis. If logFC=NULL, the logFC
values are
estimated using the ebayes for each contrast.

gs.annots

list, list of objects of class GSCollectionIndex. It is generated
using one of these functions:
buildIdx, buildMSigDBIdx,
buildKEGGIdx,
buildGeneSetDBIdx, and buildCustomIdx.

symbolsMap

dataframe, an K x 2 matrix stores the gene symbol of each
Entrez Gene ID. It
is used for the heatmap visualization. The order of rows should match that
of the
voom.results. Default symbolsMap=NULL.

baseGSEAs

character, a vector of the gene set tests that should be
included in the
ensemble. Type egsea.base to see the supported GSE methods.
By default, all
supported methods are used.

minSize

integer, the minimum size of a gene set to be included in the
analysis.
Default minSize= 2.

display.top

integer, the number of top gene sets to be displayed in
the EGSEA report.
You can always access the list of all tested gene sets using the returned
gsa list.
Default is 20.

combineMethod

character, determines how to combine p-values from
different
GSEA method. Type egsea.combine() to see supported methods.

combineWeights

double, a vector determines how different GSEA methods
will be weighted.
Its values should range between 0 and 1. This option is not supported
currently.

sort.by

character, determines how to order the analysis results in
the stats table. Type
egsea.sort() to see all available options.

numeric, cut-off threshold of logFC and is used for
Sginificance Score
and Regulation Direction Calculations. Default logFC.cutoff=0.

sum.plot.axis

character, the x-axis of the summary plot. All the
values accepted by the
sort.by parameter can be used. Default sum.plot.axis="p.value".

sum.plot.cutoff

numeric, cut-off threshold to filter the gene sets of
the summary plots
based on the values of the sum.plot.axis. Default
sum.plot.cutoff=NULL.

vote.bin.width

numeric, the bin width of the vote ranking. Default
vote.bin.width=5.

num.threads

numeric, number of CPU threads to be used. Default
num.threads=2.

report

logical, whether to generate the EGSEA interactive report. It
takes longer time
to run. Default is True.

print.base

logical, whether to write out the results of the
individual GSE methods.
Default FALSE.

verbose

logical, whether to print out progress messages and warnings.

keep.limma

logical, whether to return the results of the limma analysis.

keep.set.scores

logical, whether to calculate the gene set enrichment scores
per sample for the methods that support this option, i.e., "ssgsea".

Details

EGSEA, an acronym for Ensemble of Gene Set Enrichment
Analyses, utilizes the
analysis results of eleven prominent GSE algorithms from the literature to
calculate
collective significance scores for gene sets. These methods include:
ora,
globaltest, plage, safe, zscore, gage,
ssgsea,
roast, fry, padog, camera and gsva.
The ora, gage, camera and gsva methods depend on a competitive null
hypothesis while the
remaining seven methods are based on a self-contained hypothesis.
Conveniently, the
algorithm proposed here is not limited to these twelve GSE methods and new
GSE tests
can be easily integrated into the framework. This function takes the voom
object and
the contrast matrix as parameters.
The results of EGSEA can be seen using the topSets function.

EGSEA report is an interactive HTML report that is generated if report=TRUE to
enable a swift navigation through the results of an EGSEA analysis. The following pages
are generated for each gene set collection and contrast/comparison:
1. Stats Table page shows the detailed statistics of the EGSEA analysis for the
display.top gene sets. It shows the EGSEA scores, individual rankings and
additional annotation for each gene set. Hyperlinks to the source of each gene set
can be seen in this table when they are available. The "Direction" column shows the regulation
direction of a gene set which is calculated based on the logFC, which is
either calculated from the limma differential expression analysis or provided by the user.
The logFC.cutoff is applied for this calculation. The calculations of the EGSEA
scores can be seen in the references section. The method topSets can be used to
generate custom Stats Table.
2. Heatmaps page shows the heatmaps of the gene fold changes for the gene sets that are
presented in the Stats Table page. Red indicates up-regulation
while blue indicates down-regulation. Only genes that appear in the input expression/count
matrix are visualized in the heat map. Gene names are coloured based on their
statistical significance in the limma differential expression analysis.
The "Interpret Results" link below each heat map allows the user to download the
original heat map values along with additional statistics from limma DE analysis (
if available) so that they can be used to perform further analysis in R, e.g., customizing
the heat map visualization. Additional heat maps can be generated and customized
using the method plotHeatmap.
3. Summary Plots page shows the methods ranking plot along with the summary plots of
EGSEA analysis. The method plot uses multidimensional scaling (MDS) to visualize the
ranking of individual methods on a given gene set collection. The summary plots are
bubble plots that visualize the distribution of gene sets based on the EGSEA
Significance Score and another EGSEA score (default, p-value).
Two summary plots are generated: ranking and directional plots. Each gene set is
reprersented with a bubble which is coloured based on the EGSEA ranking (in ranking
plots ) or gene set regulation direction (in directional plots) and sized based on the
gene set cardinality (in ranking plots) or EGSEA Significance score (in directional plots).
Since the EGSEA "Significance Score" is proportional to the p-value and the
absolute fold changes, it could be useful to highlight gene sets that
have high Significance scores. The blue labels on the summary plot indicate
gene sets that do not appear in the top 10 list of gene sets based on the "sort.by"
argument (black labels) yet they appear in the top 5 list of gene sets based on
the EGSEA "Significance Score". If two contrasts are provided, the rank is calculated
based on the "comparison" analysis results and the "Significance Score" is calculated
as the mean. If sort.by = NULL, the slot sort.by of the object
is used to order gene sets.
The method plotSummary can be used to customize the Summary plots by
changing the x-axis score
and filtering bubbles based on the values of the x-axis. The method plotMethods can be
used to generate Methods plots.
4. Pathways page shows the KEGG pathways for the gene sets that are presented in the
Stats Table of a KEGG gene set collection. The gene fold changes are overlaid on the
pathway maps and coloured based on the gene regulation direction: blue for down-regulation
and red for up-regulation. The method plotPathway can be used to generate
additional pathway maps. Note that this page only appears if a KEGG gene set collection
is used in the EGSEA analysis.
5. Go Graphs page shows the Gene Ontology graphs for top 5 GO terms in each of
three GO categories: Biological Processes (BP), Molecular Functions (MF),
and Cellular Components (CC). Nodes are coloured based on the default sort.by
score where red indicates high significance and yellow indicates low significance.
The method plotGOGraph can be used to customize GO graphs by
changing the default sorting score and the number of significance nodes that can be
visualized. It is recommended that a small number of nodes is selected. Note that
this page only appears if a Gene Ontology gene set collection is used, i.e., for
the c5 collection from MSigDB or the gsdbgo collection from GeneSetDB.

Finally, the "Interpret Results" hyperlink in the EGSEA report allows the user to download
the fold changes and limma analysis results and thus improve the interpretation of the results.
Note that the running time of this function significantly increseas when
report = TRUE. For example, the analysis in the example section below
was conducted on the $203$ signaling and disease KEGG pathways using a MacBook Pro
machine that had a 2.8 GHz Intel Core i7 CPU and 16 GB of RAM. The execution time
varied between 23.1 seconds (single thread) to 7.9 seconds (16 threads) when the HTML
report generation was disabled. The execution time took 145.5 seconds when the report
generation was enabled using 16 threads.

Value

A list of elements, each with two/three elements that store the top
gene sets and the detailed analysis
results for each contrast and the comparative analysis results.