There are 98 new software packages, 16 new data experiment packages,
2 new workflows, and many updates and improvements
to existing packages; Bioconductor 3.7 is compatible with R 3.5.0,
and is supported on Linux, 32- and 64-bit Windows, and Mac OS X. This
release will include an updated Bioconductor Amazon Machine Image
and Docker containers.

Getting Started with Bioconductor 3.7

New Software Packages

There are 98 new software packages in this release of Bioconductor.

adaptest
Data-adaptive test statistics represent a general methodology for
performing multiple hypothesis testing on effects sizes while
maintaining honest statistical inference when operating in
high-dimensional settings. The utilities provided here
extend the use of this general methodology to many common data
analytic challenges that arise in modern computational and genomic
biology.

ASICS With a set of pure
metabolite reference spectra, ASICS quantifies concentration of
metabolites in a complex spectrum. The identification of
metabolites is performed by fitting a mixture model to the spectra
of the library with a sparse penalty. The method and its
statistical properties are described in Tardivel et al. (2017)
<doi:10.1007/s11306-017-1244-5>.

bcSeq This Rcpp-based
package implements a highly efficient data structure and algorithm
for performing alignment of short reads from CRISPR or shRNA
screens to reference barcode library. Sequencing error are
considered and matching qualities are evaluated based on Phred
scores. A Bayes’ classifier is employed to predict the originating
barcode of a read. The package supports provision of user-defined
probability models for evaluating matching qualities. The package
also supports multi-threading.

BEARscc BEARscc is a
noise estimation and injection tool that is designed to assess
putative single-cell RNA-seq clusters in the context of
experimental noise estimated by ERCC spike-in controls.

BiFET BiFET identifies
TFs whose footprints are over-represented in target regions
compared to background regions after correcting for the bias
arising from the imbalance in read counts and GC contents between
the target and background regions. For a given TF k, BiFET tests
the null hypothesis that the target regions have the same
probability of having footprints for the TF k as the background
regions while correcting for the read count and GC content bias.
For this, we use the number of target regions with footprints for
TF k, t_k as a test statistic and calculate the p-value as the
probability of observing t_k or more target regions with footprints
under the null hypothesis.

BiocOncoTK Provide
a central interface to various tools for genome-scale analysis of
cancer studies.

CellScore The
CellScore package contains functions to evaluate the cell identity
of a test sample, given a cell transition defined with a starting
(donor) cell type and a desired target cell type. The evaluation is
based upon a scoring system, which uses a set of standard samples
of known cell types, as the reference set. The functions have been
carried out on a large set of microarray data from one platform
(Affymetrix Human Genome U133 Plus 2.0). In principle, the method
could be applied to any expression dataset, provided that there are
a sufficient number of standard samples and that the data are
normalized.

ChIC Quality control
pipeline for ChIP-seq data using a comprehensive set of quality
control metrics, including previously proposed metrics as well as
novel ones, based on local characteristics of the enrichment
profile. The framework allows assessing quality of samples with
sharp or broad enrichment profiles, whereas previously proposed
metrics were not taking this into account. CHIC provides a
reference compendium of quality control metrics and trained machine
learning models for scoring samples.

ChIPSeqSpike
Chromatin Immuno-Precipitation followed by Sequencing (ChIP-Seq) is
used to determine the binding sites of any protein of interest,
such as transcription factors or histones with or without a
specific modification, at a genome scale. The many steps of the
protocol can introduce biases that make ChIP-Seq more qualitative
than quantitative. For instance, it was shown that global histone
modification differences are not caught by traditional downstream
data normalization techniques. A case study reported no differences
in histone H3 lysine-27 trimethyl (H3K27me3) upon Ezh2 inhibitor
treatment. To tackle this problem, external spike-in control were
used to keep track of technical biases between conditions.
Exogenous DNA from a different non-closely related species was
inserted during the protocol to infer scaling factors that enabled
an accurate normalization, thus revealing the inhibitor effect.
ChIPSeqSpike offers tools for ChIP-Seq spike-in normalization.
Ready to use scaled bigwig files and scaling factors values are
obtained as output. ChIPSeqSpike also provides tools for ChIP-Seq
spike-in assessment and analysis through a versatile collection of
graphical functions.

CTDquerier Package
to retrieve and visualize data from the Comparative Toxicogenomics
Database (http://ctdbase.org/). The downloaded data is formated as
DataFrames for further downstream analyses.

CytoDx This package
provides functions that predict clinical outcomes using single cell
data (such as flow cytometry data, RNA single cell sequencing data)
without the requirement of cell gating or clustering.

ddPCRclust The
ddPCRclust algorithm can automatically quantify the CPDs of
non-orthogonal ddPCR reactions with up to four targets. In order to
determine the correct droplet count for each target, it is crucial
to both identify all clusters and label them correctly based on
their position. For more information on what data can be analyzed
and how a template needs to be formatted, please check the
vignette.

DEComplexDisease
It is designed to find the differential expressed genes (DEGs) for
complex disease, which is characterized by the heterogeneous
genomic expression profiles. Different from the established DEG
analysis tools, it does not assume the patients of complex diseases
to share the common DEGs. By applying a bi-clustering algorithm,
DECD finds the DEGs shared by as many patients. In this way, DECD
describes the DEGs of complex disease in a novel syntax, e.g. a
gene list composed of 200 genes are differentially expressed in 30%
percent of studied complex disease. Applying the DECD analysis
results, users are possible to find the patients affected by the
same mechanism based on the shared signatures.

DEsingle DEsingle is
an R package for differential expression (DE) analysis of
single-cell RNA-seq (scRNA-seq) data. It defines and detects 3
types of differentially expressed genes between two groups of
single cells, with regard to different expression status (DEs),
differential expression abundance (DEa), and general differential
expression (DEg). DEsingle employs Zero-Inflated Negative Binomial
model to estimate the proportion of real and dropout zeros and to
define and detect the 3 types of DE genes. Results showed that
DEsingle outperforms existing methods for scRNA-seq DE analysis,
and can reveal different types of DE genes that are enriched in
different biological functions.

diffcoexp A tool for
the identification of differentially coexpressed links (DCLs) and
differentially coexpressed genes (DCGs). DCLs are gene pairs with
significantly different correlation coefficients under two
conditions. DCGs are genes with significantly more DCLs than by
chance.

dmrseq This package
implements an approach for scanning the genome to detect and
perform accurate inference on differentially methylated regions
from Whole Genome Bisulfite Sequencing data. The method is based on
comparing detected regions to a pooled null distribution, that can
be implemented even when as few as two samples per population are
available. Region-level statistics are obtained by fitting a
generalized least squares (GLS) regression model with a nested
autoregressive correlated error structure for the effect of
interest on transformed methylation proportions.

DominoEffect The
functions support identification and annotation of hotspot residues
in proteins. These are individual amino acids that accumulate
mutations at a much higher rate than their surrounding regions.

drawProteins This
package draws protein schematics from Uniprot API output. From the
JSON returned by the GET command, it creates a dataframe from the
Uniprot Features API. This dataframe can then be used by geoms
based on ggplot2 and base R to draw protein schematics.

DropletUtils
Provides a number of utility functions for handling single-cell
(RNA-seq) data from droplet technologies such as 10X Genomics. This
includes data loading, identification of cells from empty droplets,
removal of barcode-swapped pseudo-cells, and downsampling of the
count matrix.

enrichplot The
‘enrichplot’ package implements several visualization methods for
interpreting functional enrichment results obtained from ORA or
GSEA analysis. All the visualization methods are developed based on
‘ggplot2’ graphics.

FELLA Enrichment of
metabolomics data using KEGG entries. Given a set of affected
compounds, FELLA suggests affected reactions, enzymes, modules and
pathways using label propagation in a knowledge model network. The
resulting subnetwork can be visualised and exported.

GARS Feature selection
aims to identify and remove redundant, irrelevant and noisy
variables from high-dimensional datasets. Selecting informative
features affects the subsequent classification and regression
analyses by improving their overall performances. Several methods
have been proposed to perform feature selection: most of them
relies on univariate statistics, correlation, entropy measurements
or the usage of backward/forward regressions. Herein, we propose an
efficient, robust and fast method that adopts stochastic
optimization approaches for high-dimensional. GARS is an innovative
implementation of a genetic algorithm that selects robust features
in high-dimensional and challenging datasets.

GateFinder Given a
vector of cluster memberships for a cell population, identifies a
sequence of gates (polygon filters on 2D scatter plots) for
isolation of that cell type.

GDCRNATools This
is an easy-to-use package for downloading, organizing, and
integrative analyzing RNA expression data in GDC with an emphasis
on deciphering the lncRNA-mRNA related ceRNA regulatory network in
cancer. Three databases of lncRNA-miRNA interactions including
spongeScan, starBase, and miRcode, as well as three databases of
mRNA-miRNA interactions including miRTarBase, starBase, and miRcode
are incorporated into the package for ceRNAs network construction.
limma, edgeR, and DESeq2 can be used to identify differentially
expressed genes/miRNAs. Functional enrichment analyses including
GO, KEGG, and DO can be performed based on the clusterProfiler and
DO packages. Both univariate CoxPH and KM survival analyses of
multiple genes can be implemented in the package. Besides some
routine visualization functions such as volcano plot, bar plot, and
KM plot, a few simply shiny apps are developed to facilitate
visualization of results on a local webpage.

GDSArray GDS files
are widely used to represent genotyping or sequence data. The
GDSArray package implements the GDSArray class to represent nodes
in GDS files in a matrix-like representation that allows easy
manipulation (e.g., subsetting, mathematical transformation) in
R. The data remains on disk until needed, so that very large
files can be processed.

GeneStructureTools
GeneStructureTools can be used to create in silico alternative
splicing events, and analyse potential effects this has on
functional gene products.

gep2pep Pathway
Expression Profiles (PEPs) are based on the expression of pathways
(defined as sets of genes) as opposed to individual genes. This
package converts gene expression profiles to PEPs and performs
enrichment analysis of both pathways and experimental conditions,
such as “drug set enrichment analysis” and “gene2drug” drug
discovery analysis respectively.

GOfuncR GOfuncR
performs a gene ontology enrichment analysis based on the ontology
enrichment software FUNC. GO-annotations are obtained from
OrganismDb or OrgDb packages (‘Homo.sapiens’ by default); the
GO-graph is included in the package and updated regularly
(10-Apr-2018). GOfuncR provides the standard candidate vs.
background enrichment analysis using the hypergeometric test, as
well as three additional tests: (i) the Wilcoxon rank-sum test that
is used when genes are ranked, (ii) a binomial test that is used
when genes are associated with two counts and (iii) a Chi-square or
Fisher’s exact test that is used in cases when genes are associated
with four counts. To correct for multiple testing and
interdependency of the tests, family-wise error rates are computed
based on random permutations of the gene-associated variables.
GOfuncR also provides tools for exploring the ontology graph and
the annotations, and options to take gene-length or spatial
clustering of genes into account. From version 0.99.14 on it is
also possible to provide custom annotations and ontologies.

GSEABenchmarkeR
The GSEABenchmarkeR package implements an extendable framework for
reproducible evaluation of set- and network-based methods for
enrichment analysis of gene expression data. This includes support
for the efficient execution of these methods on comprehensive real
data compendia (microarray and RNA-seq) using parallel computation
on standard workstations and institutional computer grids. Methods
can then be assessed with respect to runtime, statistical
significance, and relevance of the results for the phenotypes
investigated.

gsean Biological
molecules in a living organism seldom work individually. They
usually interact each other in a cooperative way. Biological
process is too complicated to understand without considering such
interactions. Thus, network-based procedures can be seen as
powerful methods for studying complex process. However, many
methods are devised for analyzing individual genes. It is said that
techniques based on biological networks such as gene co-expression
are more precise ways to represent information than those using
lists of genes only. This package is aimed to integrate the gene
expression and biological network. A biological network is
constructed from gene expression data and it is used for Gene Set
Enrichment Analysis.

hipathia Hipathia is
a method for the computation of signal transduction along signaling
pathways from transcriptomic data. The method is based on an
iterative algorithm which is able to compute the signal intensity
passing through the nodes of a network by taking into account the
level of expression of each gene and the intensity of the signal
arriving to it. It also provides a new approach to functional
analysis allowing to compute the signal arriving to the functions
annotated to each pathway.

igvR Access to igv.js,
the Integrative Genomics Viewer running in a web browser.

IMMAN Reconstructing
Interlog Protein Network (IPN) integrated from several Protein
protein Interaction Networks (PPINs). Using this package,
overlaying different PPINs to mine conserved common networks
between diverse species will be applicable.

InTAD The package is
focused on the detection of correlation between expressed genes and
selected epigenomic signals i.e. enhancers obtained from ChIP-seq
data within topologically associated domains (TADs). Various
parameters can be controlled to investigate the influence of
external factors and visualization plots are available for each
analysis step.

iSEE Provides functions
for creating an interactive Shiny-based graphical user interface
for exploring data stored in SummarizedExperiment objects,
including row- and column-level metadata. Particular attention is
given to single-cell data in a SingleCellExperiment object with
visualization of dimensionality reduction results.

iteremoval The
package provides a flexible algorithm to screen features of two
distinct groups in consideration of overfitting and overall
performance. It was originally tailored for methylation locus
screening of NGS data, and it can also be used as a generic method
for feature selection. Each step of the algorithm provides a
default method for simple implemention, and the method can be
replaced by a user defined function.

kissDE Retrieves
condition-specific variants in RNA-seq data (SNVs,
alternative-splicings, indels). It has been developed as a
post-treatment of ‘KisSplice’ but can also be used with user’s own
data.

LineagePulse
LineagePulse is a differential expression and expression model
fitting package tailored to single-cell RNA-seq data (scRNA-seq).
LineagePulse accounts for batch effects, drop-out and variable
sequencing depth. One can use LineagePulse to perform longitudinal
differential expression analysis across pseudotime as a continuous
coordinate or between discrete groups of cells (e.g. pre-defined
clusters or experimental conditions). Expression model fits can be
directly extracted from LineagePulse.

MACPET The MACPET
package can be used for binding site analysis for ChIA-PET data.
MACPET reads ChIA-PET data in BAM or SAM format and separates the
data into Self-ligated, Intra- and Inter-chromosomal PETs.
Furthermore, MACPET breaks the genome into regions and applies 2D
mixture models for identifying candidate peaks/binding sites using
skewed generalized students-t distributions (SGT). It then uses a
local poisson model for finding significant binding sites. MACPET
is mainly written in C++, and it supports the BiocParallel package.

MAGeCKFlute
MAGeCKFlute is designed to surporting downstream analysis,
utilizing the gene summary data provided through MAGeCK or
MAGeCK-VISPR. Quality control, normalization, and screen hit
identification for CRISPR screen data are performed in pipeline.
Identified hits within the pipeline are categorized based on
experimental design, and are subsequently interpreted by functional
enrichment analysis.

martini martini deals
with the low power inherent to GWAS studies by using prior
knowledge represented as a network. SNPs are the vertices of the
network, and the edges represent biological relationships between
them (genomic adjacency, belonging to the same gene, physical
interaction between protein products). The network is scanned using
SConES, which looks for groups of SNPs maximally associated with
the phenotype, that form a close subnetwork.

mdp The Molecular Degree
of Perturbation webtool quantifies the heterogeneity of samples. It
takes a data.frame of omic data that contains at least two classes
(control and test) and assigns a score to all samples based on how
perturbed they are compared to the controls. It is based on the
Molecular Distance to Health (Pankla et al. 2009), and expands on
this algorithm by adding the options to calculate the z-score using
the modified z-score (using median absolute deviation), change the
z-score zeroing threshold, and look at genes that are most
perturbed in the test versus control classes.

MDTS A package for the
detection of de novo copy number deletions in targeted sequencing
of trios with high sensitivity and positive predictive value.

missRows The missRows
package implements the MI-MFA method to deal with missing
individuals (‘biological units’) in multi-omics data integration.
The MI-MFA method generates multiple imputed datasets from a
Multiple Factor Analysis model, then the yield results are combined
in a single consensus solution. The package provides functions for
estimating coordinates of individuals and variables, imputing
missing individuals, and various diagnostic plots to inspect the
pattern of missingness and visualize the uncertainty due to missing
values.

MSstatsQCgui
MSstatsQCgui is a Shiny app which provides longitudinal system
suitability monitoring and quality control tools for proteomic
experiments.

OmaDB A package for the
orthology prediction data download from OMA database.

omicplotR A Shiny
app for visual exploration of omic datasets as compositions, and
differential abundance analysis using ALDEx2. Useful for exploring
RNA-seq, meta-RNA-seq, 16s rRNA gene sequencing with visualizations
such as principal component analysis biplots (coloured using
metadata for visualizing each variable), dendrograms and stacked
bar plots, and effect plots (ALDEx2). Input is a table of counts
and metadata file (if metadata exists), with options to filter data
by count or by metadata to remove low counts, or to visualize
select samples according to selected metadata.

ORFik Tools for
manipulation of RiboSeq, RNASeq and CageSeq data. ORFik is
extremely fast through use of C, data.table and GenomicRanges.
Package allows to reassign starts of the transcripts with the use
of CageSeq data, automatic shifting of RiboSeq reads, finding of
Open Reading Frames for the whole genomes and many more.

perturbatr
perturbatr does stage-wise analysis of large-scale genetic
perturbation screens for integrated data sets consisting of
multiple screens. For multiple integrated perturbation screens a
hierarchical model that considers the variance between different
biological conditions is fitted. The resulting list of gene effects
is then further extended using a network propagation algorithm to
correct for false negatives.

phantasus Phantasus
is a web-application for visual and interactive gene expression
analysis. Phantasus is based on Morpheus – a web-based software for
heatmap visualisation and analysis, which was integrated with an R
environment via OpenCPU API. Aside from basic visualization and
filtering methods, R-based methods such as k-means clustering,
principal component analysis or differential expression analysis
with limma package are supported.

plyranges A
dplyr-like interface for interacting with the common Bioconductor
classes Ranges and GenomicRanges. By providing a grammatical and
consistent way of manipulating these classes their accessiblity for
new Bioconductor users is hopefully increased.

PowerExplorer
Estimate and predict power among groups and multiple sample sizes
with simulated data, the simulations are operated based on
distribution parameters estimated from the provided input dataset.

powerTCR This package
provides a model for the clone size distribution of the TCR
repertoire. Further, it permits comparative analysis of TCR
repertoire libraries based on theoretical model fits.

RandomWalkRestartMH
This package performs Random Walk with Restart on multiplex and
heterogeneous networks. It is described in the following article:
“Random Walk With Restart On Multiplex And Heterogeneous Biological
Networks”. https://www.biorxiv.org/content/early/2017/08/30/134734

RcisTarget
RcisTarget identifies transcription factor binding motifs (TFBS)
over-represented on a gene list. In a first step, RcisTarget
selects DNA motifs that are significantly over-represented in the
surroundings of the transcription start site (TSS) of the genes in
the gene-set. This is achieved by using a database that contains
genome-wide cross-species rankings for each motif. The motifs that
are then annotated to TFs and those that have a high Normalized
Enrichment Score (NES) are retained. Finally, for each motif and
gene-set, RcisTarget predicts the candidate target genes (i.e.
genes in the gene-set that are ranked above the leading edge).

RGMQL This package
brings the GenoMetric Query Language (GMQL) functionalities into
the R environment. GMQL is a high-level, declarative language to
manage heterogeneous genomic datasets for biomedical purposes,
using simple queries to process genomic regions and their metadata
and properties. GMQL adopts algorithms efficiently designed for big
data using cloud-computing technologies (like Apache Hadoop and
Spark) allowing GMQL to run on modern infrastructures, in order to
achieve scalability and high performance. It allows to create,
manipulate and extract genomic data from different data sources
both locally and remotely. Our RGMQL functions allow complex
queries and processing leveraging on the R idiomatic paradigm. The
RGMQL package also provides a rich set of ancillary classes that
allow sophisticated input/output management and sorting, such as:
ASC, DESC, BAG, MIN, MAX, SUM, AVG, MEDIAN, STD, Q1, Q2, Q3 (and
many others). Note that many RGMQL functions are not directly
executed in R environment, but are deferred until real execution is
issued.

RNAdecay RNA
degradation is monitored through measurement of RNA abundance after
inhibiting RNA synthesis. This package has functions and example
scripts to facilitate (1) data normalization, (2) data modeling
using constant decay rate or time-dependent decay rate models, (3)
the evaluation of treatment or genotype effects, and (4) plotting
of the the data and models. Data Normalization: functions and
scripts make easy the normalization to the initial (T0) RNA
abundance, as well as a method to correct for artificial inflation
of Reads per Million (RPM) abundance in global assesements as the
total size of the RNA pool deacreases. Modeling: Normalized data is
then modeled using maximum likelihood to fit parameters. For making
treatment or genotype comparisons (up to four), the modeling step
models all possible treatement effects on each gene by repeating
the modeling with constraints on the model parameters (i.e., the
decay rate of treatments A and B are modeled once with them being
equal and again allowing them to both vary independently). Model
Selection: The AICc value is calculated for each model, and the
model with the lowest AICc is chosen. Modeling results of selected
models are then compiled into a single data frame. Graphical
Plotting: a function is provided to easily visualize the data and
the selected model using ggplot2 package functions.

rWikiPathways
Use this package to interface with the WikiPathways API.

scFeatureFilter
An R implementation of the correlation-based method developed in
the Joshi laboratory to analyse and filter processed single-cell
RNAseq data. It returns a filtered version of the data containing
only genes expression values unaffected by systematic noise.

scmeth Functions to
analyze methylation data can be found here. Some functions are
relevant for single cell methylation data but most other functions
can be used for any methylation data. Highlight of this workflow is
the comprehensive quality control report.

Sconify This package
does k-nearest neighbor based statistics and visualizations with
flow and mass cytometery data. This gives tSNE maps”fold change”
functionality and provides a data quality metric by assessing
manifold overlap between fcs files expected to be the same. Other
applications using this package include imputation, marker
redundancy, and testing the relative information loss of lower
dimension embeddings compared to the original manifold.

SDAMS This Package
utilizes a Semi-parametric Differential Abundance analysis (SDA)
method for metabolomics and proteomics data from mass spectrometry.
SDA is able to robustly handle non-normally distributed data and
provides a clear quantification of the effect size.

SEPIRA SEPIRA (Systems
EPigenomics Inference of Regulatory Activity) is an algorithm that
infers sample-specific transcription factor activity from the
genome-wide expression or DNA methylation profile of the sample.

seqsetvis seqsetvis
enables the visualization and analysis of multiple genomic
datasets. Although seqsetvis was designed for the comparison of
mulitple ChIP-seq datasets, this package is domain-agnostic and
allows the processing of multiple genomic coordinate files
(bed-like files) and signal files (bigwig files or bam pileups).

sevenC Chromatin
looping is an essential feature of eukaryotic genomes and can bring
regulatory sequences, such as enhancers or transcription factor
binding sites, in the close physical proximity of regulated target
genes. Here, we provide sevenC, an R package that uses protein
binding signals from ChIP-seq and sequence motif information to
predict chromatin looping events. Cross-linking of proteins that
bind close to loop anchors result in ChIP-seq signals at both
anchor loci. These signals are used at CTCF motif pairs together
with their distance and orientation to each other to predict
whether they interact or not. The resulting chromatin loops might
be used to associate enhancers or transcription factor binding
sites (e.g., ChIP-seq peaks) to regulated target genes.

SIAMCAT Pipeline for
Statistical Inference of Associations between Microbial Communities
And host phenoTypes (SIAMCAT). A primary goal of analyzing
microbiome data is to determine changes in community composition
that are associated with environmental factors. In particular,
linking human microbiome composition to host phenotypes such as
diseases has become an area of intense research. For this, robust
statistical modeling and biomarker extraction toolkits are
crucially needed. SIAMCAT provides a full pipeline supporting data
preprocessing, statistical association testing, statistical
modeling (LASSO logistic regression) including tools for evaluation
and interpretation of these models (such as cross validation,
parameter selection, ROC analysis and diagnostic model plots).

signet An R package to
detect selection in biological pathways. Using gene selection
scores and biological pathways data, one can search for
high-scoring subnetworks of genes within pathways and test their
significance.

singleCellTK Run
common single cell analysis directly through your browser including
differential expression, downsampling analysis, and clustering.

SparseSignatures
Point mutations occurring in a genome can be divided into 96
categories based on the base being mutated, the base it is mutated
into and its two flanking bases. Therefore, for any patient, it is
possible to represent all the point mutations occurring in that
patient’s tumor as a vector of length 96, where each element
represents the count of mutations for a given category in the
patient. A mutational signature represents the pattern of mutations
produced by a mutagen or mutagenic process inside the cell. Each
signature can also be represented by a vector of length 96, where
each element represents the probability that this particular
mutagenic process generates a mutation of the 96 above mentioned
categories. In this R package, we provide a set of functions to
extract and visualize the mutational signatures that best explain
the mutation counts of a large number of patients.

srnadiff Differential
expression of small RNA-seq when reference annotation is not given.

SummarizedBenchmark
This package defines the BenchDesign and SummarizedBenchmark
classes for building, executing, and evaluating benchmark
experiments of computational methods. The SummarizedBenchmark class
extends the RangedSummarizedExperiment object, and is designed to
provide infrastructure to store and compare the results of applying
different methods to a shared data set. This class provides an
integrated interface to store metadata such as method parameters
and software versions as well as ground truths (when these are
available) and evaluation metrics.

TCGAutils A suite of
helper functions for checking and manipulating TCGA data including
data obtained from the curatedTCGAData experiment package. These
functions aim to simplify and make working with TCGA data more
manageable.

TFEA.ChIP Package to
analize transcription factor enrichment in a gene set using data
from ChIP-Seq experiments.

TissueEnrich The
TissueEnrich package is used to calculate enrichment of
tissue-specific genes in a set of input genes. For example, the
user can input the most highly expressed genes from RNA-Seq data,
or gene co-expression modules to determine which tissue-specific
genes are enriched in those datasets. Tissue-specific genes were
defined by processing RNA-Seq data from the Human Protein Atlas
(HPA) (Uhlén et al. 2015), GTEx (Ardlie et al. 2015), and mouse
ENCODE (Shen et al. 2012) using the algorithm from the HPA (Uhlén
et al. 2015).The hypergeometric test is being used to determine if
the tissue-specific genes are enriched among the input genes. Along
with tissue-specific gene enrichment, the TissueEnrich package can
also be used to define tissue-specific genes from expression
datasets provided by the user, which can then be used to calculate
tissue-specific gene enrichments.

Trendy Trendy
implements segmented (or breakpoint) regression models to estimate
breakpoints which represent changes in expression for each
feature/gene in high throughput data with ordered conditions.

tRNAscanImport
The package imports the result of tRNAscan-SE as a GRanges object.

TTMap TTMap is a
clustering method that groups together samples with the same
deviation in comparison to a control group. It is specially useful
when the data is small. It is parameter free.

TxRegInfra This
package provides interfaces to genomic metadata employed in
regulatory network creation, with a focus on noSQL solutions.
Currently quantitative representations of eQTLs, DnaseI
hypersensitivity sites and digital genomic footprints are assembled
using an out-of-memory extension of the RaggedExperiment API.

vidger The aim of
vidger is to rapidly generate information-rich visualizations for
the interpretation of differential gene expression results from
three widely-used tools: Cuffdiff, DESeq2, and edgeR.

New Data Experiment Packages

There are 16 new data experiment packages in this release of Bioconductor.

ASICSdata 1D NMR
example spectra and additional data for use with the ASICS package.
Raw 1D Bruker spectral data files were found in the MetaboLights
database (https://www.ebi.ac.uk/metabolights/, study MTBLS1).

hgu133plus2CellScore
The CellScore Standard Dataset contains expression data from a wide
variety of human cells and tissues, which should be used as
standard cell types in the calculation of the CellScore. All data
was curated from public databases such as Gene Expression Omnibus
(https://www.ncbi.nlm.nih.gov/geo/) or ArrayExpress
(https://www.ebi.ac.uk/arrayexpress/). This standard dataset only
contains data from the Affymetrix GeneChip Human Genome U133 Plus
2.0 microarrays. Samples were manually annotated using the database
information or consulting the publications in which the datasets
originated. The sample annotations are stored in the phenoData slot
of the expressionSet object. Raw data (CEL files) were processed
with the affy package to generate present/absent calls (mas5calls)
and background-subtracted values, which were then normalized by the
R-package yugene to yield the final expression values for the
standard expression matrix. The annotation table for the microarray
was retrieved from the BioC annotation package hgu133plus2. All
data are stored in an expressionSet object.

mCSEAdata Data
objects necessary to some mCSEA package functions. There are also
example data objects to illustrate mCSEA package functionality.

MetaGxBreast A
collection of Breast Cancer Transcriptomic Datasets that are part
of the MetaGxData package compendium.

MetaGxOvarian A
collection of Ovarian Cancer Transcriptomic Datasets that are part
of the MetaGxData package compendium.

MetaGxPancreas
A collection of pancreatic Cancer transcriptomic datasets that are
part of the MetaGxData package compendium.

RcisTarget.hg19.motifDBs.cisbpOnly.500bp
RcisTarget databases: Gene-based motif rankings and annotation to
transcription factors. This package contains a subset of 4.6k
motifs (cisbp motifs), scored only within 500bp upstream and the
TSS. See RcisTarget tutorial to download the full databases,
containing 20k motifs and search space up to 10kbp around the TSS.

RGMQLlib A package
that contains scala libraries to call GMQL from R used by RGMQL
package. It contains a scalable data management engine written in
Scala programming language.

TCGAbiolinksGUI.data
Supporting data for the TCGAbiolinksGUI package. It includes the
following objects: glioma.gcimp.model, glioma.idhwt.model
glioma.idhmut.model,glioma.idh.mode, probes2rm,
maf.tumor,GDCdisease.

tissueTreg The
package provides ready to use epigenomes (obtained from TWGBS) and
transcriptomes (RNA-seq) from various tissues as obtained in the
study (Delacher and Imbusch 2017, PMID: 28783152). Regulatory T
cells (Treg cells) perform two distinct functions: they maintain
self-tolerance, and they support organ homeostasis by
differentiating into specialized tissue Treg cells. The underlying
dataset characterises the epigenetic and transcriptomic
modifications for specialized tissue Treg cells.

New Workflows

BiocMetaWorkflow
Bioconductor Workflow describing how to use BiocWorkflowTools to work with a
single R Markdown document to submit to both Bioconductor and F1000Research.

simpleSingleCell
This workflow implements a low-level analysis pipeline for scRNA-seq data
using scran, scater and other Bioconductor packages. It describes how to
perform quality control on the libraries, normalization of cell-specific
biases, basic data exploration and cell cycle phase identification. Procedures
to detect highly variable genes, significantly correlated genes and
subpopulation-specific marker genes are also shown. These analyses are
demonstrated on a range of publicly available scRNA-seq data sets.

The first release of this package was made as part of Bioconductor
3.7, in April 2018. The adaptest R package provides routines for the
method first described in the the technical manuscript 1 and the
software paper 2: 1. Weixin Cai, Nima S. Hejazi, Alan E. Hubbard.
Data-adaptive statistics for multiple hypothesis testing in
high-dimensional settings. 2. Weixin Cai, Alan E. Hubbard, Nima S.
Hejazi. adaptest: Data-Adaptive Statistics for High-Dimensional
Testing in R.

Add the ability to keep duplicate regions in summarize_categorical()
and plot_categorical(). This is accomplished with the ‘by’ parameter
in the former and by the ‘x’ and ‘fill’ parameters in the latter, and
passing their contents into the ‘.dots’ parameter of
dplyr::distinct_().

Make TxDb and OrgDb packages Suggests instead of Imports. NOTE: This
saves space, but also requires downloading the appropriate packages
as needed.

Add list_env() function to the annotatr_cache environment to see what
custom annotations have been read in and added to the cache.

BUGFIXES

Replace dplyr::summarize_each_() with dplyr::summarize_at() in line
with deprecation in the dplyr package.

robustSmoothSpline() now supports using Tukey’s biweight (in addition
to already exising L1) estimators. See argument ‘method’. Thanks to
Aaron Lun at the Cancer Research UK Cambridge Institute for adding
this feature.

Changes in version 3.9.0 (2017-10-30):

The version number was bumped for the Bioconductor devel version,
which is now BioC 3.7 for R (>= 3.5.0).

(v. 1.3.38) … argument exposed and pased to GET, this includes
bfcadd which original … was pasesed to file.copy. This use of …
is used in bfc functions bfcadd(), bfcupdate() and bfcdownload()
which could potential download the file.

biocLite() supports github repositories using the remotes package,
rather than devtools. This change should be transparent to end users.
(From Peter Hickey
https://github.com/Bioconductor/BiocInstaller/issues/4)

plotAnnot: - new “facet” option for running ggplot2::facet_wrap() on
the plots. - better documentation for the “scopes” of the plot. -
removed “customScope” argument: pass a function directly to “scope”
argument instead.

plotCorrelation2: - new methods for SummarizedExperiment, DataFrame,
data.frame and matrix. - Corrected the correlation matrix output
(previously the diagonal values were 0 and the other values were
swapped).

plotReverseCumulatives: - fitInRange accepts a value of “NULL” to
turn off power law fitting. - New “legend” argument to remove legend
when set to “FALSE”. - Axis range and labels can be modified with
xlab/ylab and xlim/ylim. - Ladders steps start on the values instead
of being centered on.

setColors: allow lowercase in color names.

Changes in version 1.21.4:

Corrected a bug that was crashing CAGEset objects when loading more
than one BAM file.

Load BAM as gapped alignment with readGAlignments() instead of
scanBam(). Without this correction, TSS position on minus strand is
incorrect in case of indels in the read.

Partial support for loading BAM data in CAGEexp objects.
(correctSystematicG is not yet implemented)

Added multicore processing in hanabi function.

Changes in version 1.21.3:

Enforce syntactically valid sample names in CAGEexp class.

Changes in version 1.21.2:

Use a plain DataFrame as rowData in seqNameTotals slots.

Changes in version 1.21.1:

BACKWARDS-INCOMPATIBLE CHANGES

The plotting functions send their output to the graphical device
instead of writing it to a file. This makes their use more
consistent with most plotting functions in R.

NEW FEATURES

New “CAGEexp” class extending the MultiAssayExperiment class. It
stores expression data more efficiently than “CAGEset”, and uses core
Bioconductor typse natively. For backwards compatibility it also
support many of the original generic functions for “CAGEset” objects.

New “CTSS” and “TagClusters” classes wrapping GRanges objects, for
more type safety.

New functions for quality controls such as plotAnnot() or
hanabiPlot(). See the CAGEexp vignette for details.

Data export as DESeqDataSet object for DESeq2 with the new
“consensusClustersDESeq2” and “GeneExpDESeq2” functions.

New “bedctss” format to load the FANTOM5 and FANTOM6 CAGE data.

New “CAGEscanMolecule” format to load CAGEscan 3.0 data.

Multicore parallelisation with BiocParallel instead of parallel.

New function sampleList() to help looping on samples with lapply().

New plotCorrelation2() function, faster than plotCorrelation()
because it is plain black and white.

Multicore loading of CTSS data in CAGEexp object.

OTHER CHANGES

Example data “exampleCAGEexp”, “exampleCAGEexp” and
“exampleZv9_annot” are now is lazy-loaded.

Passes R CMD check without errors or notes.

NULL can be passed as genome name, to circumvent the requirement for
a BSgenome object when actually not needing one.

In CAGEexp objects, expression quantile positions are given relative
to the cluster start site.

For performance reasons, the positions of a quantile Q is now
calculated as the position of the first base where cumulative
expression is higher or equal to Q% of the total expression of a
cluster.

heatmapOutput() can now determine the heatmap margines and column and
row name sizes automatically.

New image formats TIFF, JPG and BMP in addition to the previous PNG
file format for heatmapOutput(). They can be chosen from
processOneStudy() and processMultipleStudies or directly from
heatmapOutput() function.

heatmapOutput() now uses two methods for ranking the genes prior to
generating heatmap(s). One of them is suited for finding genes that
have unique high values in one or few cancer studies whereas the
other method aids in detemining genes that possess high values in
multiple / many cancers.

If function argumnets are entered wrongly, more meaningful errors
will appear.

BUG FIX * fix gray boxes in plotOptimResultsPan * in
plotOptimResultsPan when errors were greater than 99.9% color were
white (instead of red) * c simulator bug fix: at time 0, node that
are inhibited and measured were reset to 0 but inhibitors are off.
Fix is simple: do not reset inhibitors where time is zero * Fixing
issue with time 0 not being simulated properly (see
https://github.com/cellnopt/CellNOptR/issues/6) This fixes regression
bug following fix made in release 1.11.3 * node name in the sif file
containing the word AND (e.g. ligand) will not result in an AND-node
anymore * Fixing issues with matrix subsetting, when the subsetting
converts the matrix to a vector * plotOptimResultsPan: in the
computation of root-mean-square error(for coloring the background),
the NA data is not counted in the number of data points

CHANGES * Bioconductor’s version of the package got merged with the
Github’s version leading to minor changes * readSIF reads only the
unique interactions/lines from the SIF file * plotOptimResultsPan and
plotCNOlist plots intermediate cue values (0,1) for CNORode add-on

Added support for hdf5 files stored in assay slot via the
HDF5Array package

Removed most defaults from RSEC arguments – pull them from
underlying functions’ defaults.

Changes in version 1.99.3 (2018-04-17):

Changes

Re-implemented subsampleClustering() and combineMany() to use C++.

Method “adjP” in mergeClusters now allows for further requirement
that gene have a minimal log-fold change (‘logFCcutoff’).

Bugs

Fix bug in setBreaks (isPositive and isNegative variables)

use stringr::str_sort to make sort of character values locale
independent

Changes in version 1.99.2 (2018-03-22):

Bugs

Fix defaultNDims so that returns minimum of 50 and the minimum
dimension of data.

Fix RSEC so still returns results if hit error after clusterMany.

Changes in version 1.99.0 (2018-02-15):

Changes

MAJOR CHANGE TO DEFINITION OF CLASS: This version consists of a major
update of how dimensionality reduction and filtering is done. The
class has been updated to extend the new SingleCellExperiment
class, which save the dimensionality reductions. Furthermore,
calculating of per-gene statistics, which are usually used for
filtering, are stored in colData of the object and can be easily
accessed and used for repeated filtering without recalculating. This
has created a massive change under-the-hood in functions that allow
dimensionality reduction and filtering. Changes to function names are
the following: - transform is now transformData - New functions
makeReducedDims and makeFilterStats will calculate (and thus
store) dimensionality reductions and statistics for filtering the
data. - New function filterData will return the filtered data as a
matrix - New functions listBuiltInReducedDims and
listBuiltInFilterStats give the list of currently available
functions for dimensionality reduction and filtering statistics,
respectively. - Filtering on arbitrary statistics and user-defined
dimensionality reduction can used in clusterMany and related
functions, so long as they are saved in the appropriate slots of the
object.

Changed the following functions/arguments to be consistent with
SingleCellExperiment naming conventions and improve distinction
between terminology of cluster and clustering. - Capitalized
constructor functions. Now: ClusterFunction() and
ClusterExperiment() - nPCADims now changed to nReducedDims in
clusterMany-related functions - nVarDims now changed to
nFilterDims in clusterMany-related functions - dimReduce argument
now changed to reduceMethod across functions - ndims to nDims
in clusterSingle and makeDendrogram to keep consistency. -
plotDimReduce to plotReducedDims - Changed nClusters to
nClusterings to better indicate purpose of function. nClusters
now gives the number of clusters per clustering. - addClusters to
addClusterings and removeClusters to removeClusterings. New
function removeClusters allows the user to actually ``remove” a
cluster or clusters from a clustering by assigning samples in those
clusters to -1 value. - clusterInfo() to clusteringInfo()

In addition these structural changes, the following enhancements are
also included in this release - New function plotClusterLegend that
will plot a legend for a clustering. - Color definition changes:
showBigPalette has been replaced with showPalette and now can
show any palette of colors. Adjusted color definitions of seqPal2
and seqPal4 to be completely symmetric around center. The colors in
bigPalette have been changed and shuffled to reduce similar colors
and massivePalette has been created by adding all of the non-grey
colors (in random order) from colors() so that plotClusters will
not run out of colors. - getClusterManyParams: now uses saved
clusterInfo rather than more fragile clusterLabels to get
parameters. The resulting output is formatted somewhat differently. -
ClusterExperiment: removed transformation as a required argument.
Now sets with default of function(x){x}. Allows argument
clusterLegend to define the clusterLegend slot in the constructor.

plotClustersWorkflow: Argument existingColors in now takes
arguments ignore,all,highlightOnly similar to plotClusters -
plotDendrogram: Argument nodeColors now available. Changed
defaults so default is to do colorblock of samples. -
plotContrastHeatmap: Argument contrastColors now available to
assign colors to the contrasts. Genes are now ordered by fold-change
within each contrast. - plotClusters: argument existingColors now
allows for the option firstOnly - makeDendrogram: now allows
option ‘coCluster’ to the argument reduceMethod indicating use of
the coClustering matrix to build the dendrogram. makeDendrogram now
also has a method for building a dendrogram from an arbitrary
distance function - clusterMatrix: now returns cluster matrix with
rownames corresponding to sample names. - convertClusterLegend: now
takes argument whichClusters

Bugs

converted automatic assignment of colors in clusterLegend to be
based on massivePalette so won’t run out on toy examples.

fixed minor bugs in plotHeatmap so that will - handle factor with
only one value in annotation - will plot annotation labels when there
is NA in the annotation - no longer calls internal function
NMF:::vplayout in making those labels, more robust

fixed bug in how plotClustersWorkflow handled existing colors.

Fixed so diss now passed to subsampling in calls to
clusterSingle/clusterMany

Fixed so plotClusters now will not give incomprehensible error if
given duplicates of a color

Release Deprecated the cg_topics() and FitGoMpool() functions.
Modified the FitGoM() function so that it has the flexibility to run
multiple runs using the num_trials argument and also returns the BIC
for each model fit. Modified the compGoM() function so that it can
take either a topic model object, as well as a list of topic model
objects. Also deprecated the base R graphics for the Structureplot
and modified the StructureGGplot() function to take single label
samples.

Changes in version 1.5.1:

Release We have added a FitGoMpool() function that automatically
performs GoM model with multiple starting points and outputs the one
run with the most optimal BIC. Besides, removed
switch_axis_position() as a dependency from cowplot as the function
has been deprecated.

nbases has replaced the nreads parameter in the learnErrors function.
As suggested by the name, this controls the amount of data the
machine learning uses by the total number of bases rather than the
read count, which is more appropriate given the range of read-lengths
in target applications.

OMEGA_C has been set to 1e-40 by default. This means that
error-correction is no longer performed on all reads, but instead
just those a post-hoc probability less than OMEGA_C=1e-40. In
practice this has a very small impact on final abundances.

BUG FIXES

The memory usage and speed of assignSpecies on large datasets has
significantly improved.

Changes in version 1.7.7:

NEW FEATURES

Error-correction can now be modified, and turned off, by the OMEGA_C
parameter, which controls the threshold at which reads that are
inferred to contain errors are corrected (or not) to the sequence
from which they are inferred to originate.

SIGNIFICANT USER-VISIBLE CHANGES

The DADA2 options enabling the quick gapless alignment check, and
extremely conservative greediness in the partioning method were
turned on by default (GAPLESS=TRUE, GREEDY=TRUE). Some speedup in the
core denoising algorithm.

plotQualityProfile now includes a cumulative description of read
length variation.

Changes in version 1.7.6:

NEW FEATURES

A new and extremely conservative form of greediness in the core
denoising algorithm was added, and can be turned on by setting the
DADA2 option GREEDY=TRUE. This provides some speedup in the core
denoising algorithm.

Changes in version 1.7.5:

NEW FEATURES

The dada(…) function now accepts a list of “priors”, i.e. sequences
for which there is prior evidence they might be real. Input sequences
that match one of the priors are evaluated against a relaxed
threshold of statistical evidence (OMEGA_P instead of OMEGA_A), and
can be detected even as singletons.

The dada(…) function can perform “pseudo-pooling” with dada(…,
pool=”pseudo”). In pseudo-pooling, the input samples are denoised
independently, then a set of sequences that appear in at least
MIN_PREVALENCE samples are used as priors for a second and final
round of sample inference. MIN_PREVALENCE=2 by default.

Changes in version 1.7.4:

NEW FEATURES

A new fast screen for optimal gapless alignments in the core
denoising algorithm was added, and can be turned on by setting the
DADA2 option GAPLESS=TRUE. This provides some speedup in the core
denoising algorithm.

Changes in version 1.7.3:

BUG FIXES

Fixed an overflow bug on sequences 260nts or longer in the SSE=2
code.

Changes in version 1.7.2:

SIGNIFICANT USER-VISIBLE CHANGES

The DADA2 option enabling explicit 8-bit SSE vectorization in the C
code was turned on by default (SSE=2). Some speedup in the core
denoising algorithm.

Changes in version 1.7.1:

NEW FEATURES

seqComplexity calculates the complexity of input sequences, and can
be used to identify and filter out low-complexity sequences.

SIGNIFICANT USER-VISIBLE CHANGES

The default minOverlap parameter of mergePairs was reduced from 20 to
12, and the alignment parameters used during merging were altered to
more strongly penalize mismatches and gaps, which improves merging
performance in repetitive sequences.

The Stacking meta-learner can be composed by the user, setting the
new parameter ‘cl_type’ of the DaMiR.EnsembleLearning() function. Any
combination of the 8 classifiers is now allowed.

If the dataset is imbalanced, a ‘Down-Sampling’ strategy is
automatically applied.

The DaMiR.FSelect() function has the new argument, called ‘nPlsIter’,
which allows the user to have a more robust features set. In fact,
several feature sets are generated by the bve_pls() fuction (embedded
in DaMiR.FSelect()), setting ‘nPLSIter’ parameter greater than 1.
Finally, an intersection among all the feature sets is performed to
return those features which constantly occur in all runs. However, by
default, ‘nPlsIter = 1’.

DaMiR.Allplot() accepts also ‘matrix’ objects as well as NA values
(which are not plotted).

The DaMiR.normalization() function estimates the dispersion, through
the parameter ‘nFitType’; as in DESeq2 package, the argument can be
‘parametric’ (default), ‘local’ and ‘mean’.

In the DaMiR.normalization() function, the gene filtering is desabled
if ‘minCount = 0’.

In the DaMiR.EnsembleLearning() function, the method for implementing
the Logistic Regression has been changed to allow multi-class
comparisons; instead of the native ‘lm’ function, ‘bayesglm’ method
implemented in the caret ‘train’ function, properly set, is now used.

The new parameter ‘second.var’ of the DaMiR.SV() function, allows the
user to take into account a secondary variable of interest (factorial
or numerical) that the user does not wish to correct for, during the
sv identification.

Version: 1.15.4
Text: 2017-03-27 Lorena Pantano lorena.pantano@gmail.com Fix: Fix
typo in variable inside degClean Fix: Remove all columsn with
NA values in degClean Feature: Plot only when degPatterns has
only one gene. Thanks Amir Jassim. Feature: Add geom_cor to
plot correlation values to a ggplot2 plot. Feature: Add
eachStep option to degPattern to apply groupDifference to each
time point and not only to the maximum and minimum values.
Feature: Add covariates dendograme to degCovariates. Fix:
Wrong matrix in degPattern. Thanks Amir Jassim. Feature: Add
option to filter genes in degPattern. Thanks Amir Jassim.
Feature: Return raw and summarise table in degPattern Feature:
Migrate to rmarkdown for vignette Feature: Return prcomp output
when using degPCA Fix: Typo in degPattern function, and set up
to FALSE the use of consensusCluster. Fix: degPlot to be able
to work with one gene. Feature: Add the option to look for
specific patterns, or genes as reference. Feature: Return
scaled values if scale==TRUE in degPattern. Feature: Add
values used in plots for degPattern function. Thanks to Amir
Jassim. Feature: Get significants for a list of DEGSet objects
binding the tables first, calculating a new FDR, and aplying
the filter as last step.
https://support.bioconductor.org/p/104059/#104072

Version: 1.15.2
Text: 2017-01-08 Lorena Pantano lorena.pantano@gmail.com Feature: Add
support to list for significant and recover full table.
Feature: Add support to different shrinkage estimator. Fix:
Volcano plot was plotting wrong the shadows in the y-axis.
Fix: Use correct option in DESeq2::results to count UP/DOWN
genes. Feature: Allow to ask for up/down genes. Thanks to
Radhika Khetani.

Fixed an issue in plotCluster() on how it was loading the
hg19IdeogramCyto object from the biovizBase package.

Changes in version 1.13.4:

BUG FIXES

Fixed an issue with a call to GenomicRanges::gaps() that affected how
the introns were plotting in plotRegionCoverage() when the underlying
data has a specifying start and end of the chromosome (that is, a
seqinfo() with seqlengths specified). Thanks to Emily Burke for
reporting this issue https://github.com/emilyburke.

Added ‘lfcThreshold’ argument to lfcShrink() for use with
type=”normal” and type=”apeglm”. For the latter, lfcShrink() will
compute FSOS s-values, for bounding when the LFC will be “false sign
or small”, where small is defined by lfcThreshold.

Switching to a ~10x faster apeglm implementation for use in the
lfcShrink() function.

Beginning the deprecation of exploratory analysis of designs without
replicates. Analysis of designs without replicates will be removed in
the Oct 2018 release: DESeq2 v1.22.0, after which DESeq2 will give an
error.

Elevate ‘minmu’ to DESeq() as this proves useful for single cell
applications and certain zero-inflated data.

Elevate ‘useT’ to DESeq(), which will use (n - p) for the degrees of
freedom of the t distribution, and if weights are provided, it will
use the sum of weights as ‘n’.

Five diffusion kernels available, they can be computed from an
‘igraph’ object.

Diffusion implementations divided between ‘diffuse_raw’ for
deterministic scores and ‘diffuse_mc’ for permutation analysis, which
is parallelised. In total, seven diffusion scores are accessible
through the ‘diffuse’ function.

Performance evaluation wrapped in the ‘perf’ function.

Helper functions in helpers.R (to plot diffusion scores, to check if
a kernel matrix is actually a kernel, to extract largest CC from a
graph)

dmrseq now provides support for detecting large-scale methylation
blocks. To use this feature, specify block=TRUE, and increase the
smoothing span parameters minInSpan, bpSpan, and maxGapSmooth.
More details are provided in the documentation and vignette.

Changes in version 0.99.8 (2018-03-21):

dmrseq no longer requires balanced, two-group comparisons. To run
using a continuous or categorial covariate with more than two groups,
simply pass in the name of a column in pData that contains this
covariate. A continuous covariate is assmued if the data type in the
testCovariate slot is continuous, with the exception of if there
are only two unique values (then a two group comparison is carried
out).

Version: 0.99.10
Category: ARGUEMENT
Text: show.legend added as argument to allow the option to show or not
show the legend as per ggplot2.

Version: 0.99.9
Category: FEATURES
Text: New function called draw_recept_dom(). This function allows the
drawing of the TOPO_DOM and TRANSMEM types of receptors. Data
from TNFR1 and CD40 are included to demonstate the function.

Version: 0.99.8
Category: FEATURES
Text: New function called extact_transcripts. This function will ammend
the data frame to allow each chain from the same UniProt
accession number to to drawn separately. A vignette entitled
drawProteins_extract_transcripts has been written to
demonstrate.

Version: 0.99.8
Category: FEATURES
Text: LazyData is now false and NAMESPACE updated as per Bioconductor
review.

Version: 0.98.3
Category: FEATURES
Text: New function called draw_canvas. This function was previously
within draw_chains but has now been pulled out to allow the
generation of a canvas separately from the chains. It did
require quite a rewrite but I think it will make things more
useful For example, it will allow the plotting of domains
without chains which has the potential to be very useful.

Version: 0.98.2
Category: FEATURES
Text: Rename functions from geom to draw. E.g geom_chains is now
geom_draw. This is because they weren’t really geoms and using
the word draw seem more helpful and a better reflection of the
function.

The default ‘host’ is not specified, defaulting to vep default of
‘ensembldb.ensembl.org’. The default previously was
‘useastdb.ensembl.org’ Users in the US may find connection and
transfer speeds quicker using the East coast mirror,
‘useastdb.ensembl.org’. It was updated because the mirror only
supports current and current minus one vep version, and to bring the
default in line with the default of vep.

replaced QuasiSeq with edgeR for differential expression testing, due
to deprecation of QuasiSeq

users will observe significant speed increases for DE testing

users will observe changes in results from example data,
Quasi-Likelihood methods are similar for edgeR and QuasiSeq, but not
exactly identical, so P-value distributions are different, LODR
estimates have changed.

dispersion plot (dispPlot) is now generated via edgeR instead of
QuasiSeq, using a similar Quasi-Likelihood dispersion shrinkage
method.

BUG FIXES AND MINOR IMPROVEMENTS

changed behavior so that dispPlot is not deleted if DE testing is
rerun, instead increment file name to prevent overwriting.

BUG FIX: the G2 peak of the B sample was not getting incorporated
into model construction, which caused model fitting to fail on
samples with histograms skewed towards the left.

Updated DebrisModel documentation.

Internal Changes

Fixed broken test.

Changes in version 1.5.3 (2018-01-17):

User Visible Changes

The browser interface will no longer allow users to enter arbitrary
text to select the number of samples. Only valid values, i.e., 1, 2,
or 3, will be offered as choices. The old version would crash with a
bad value for sample number.

Improved file importing, so that low-quality samples for which peaks
cannot be detected can still be imported.

Added a new argument to FlowHist, so users can set the threshold
below which the data is ignored when screening out debris. The
default remains the same, 40, but users with very clean histograms,
with peaks far to the left, can now lower this value if needed.

Model fitting is now limited to the range of the data. That is, empty
bins at the right (upper) end of the histogram will not be included
when fitting the NLS. This addresses problems that generated inflated
RCS values.

The limits for the linearity parameter have been extended from
1.9-2.1, to 1.5-2.5. This will improve/enable model fitting on
samples where the linearity (the ratio of the G2/G1 peaks) was
outside the range 1.9-2.1.

Changes in version 1.5.1 (2017-12-04):

Minor bug squashed

Changes made upstream to car::deltaMethod() introduced a bug. This
will be resolved as of car version 2.1-7. Until that version of
car makes it’s way into CRAN, calling deltaMethod() on an nls
object will require the argument vcov. be explicitly set to vcov,
a function in the stats package. This has been done in flowPloidy,
and should be invisible to users. In addition, the bug is not present
in the current R release, so it should not need to be backported to
previous versions of flowPloidy.

New methods assocTestSingle and assocTestAggregate are refactors of
assocTestMM and assocTestSeq/assocTestSeqWindow, respectively.
assocTestSeq and assocTestSeqWindow are deprecated. assocTestMM is
still used for GenotypeData objects, but will be deprecated in a
future release. fitNullModel is a refactor of fitNullMM/fitNullReg
and should be used with the new association test methods.

Version: 1.11.2
Category: IMPROVEMENTS AND BUG FIXES
Text: the following C++ functions return NAs instead of zeros if the
length of the vector is smaller than the number of bins:
binMean(), binMedian(), binMax(), binMin(), binSum()

makeTxDbFromUCSC() now uses direct SQL queries (to the UCSC MySQL
server at genome-mysql.soe.ucsc.edu) instead of
rtracklayer::getTable() to fetch data from the Genome Browser. This
avoids the issue reported here
https://github.com/lawremi/rtracklayer/issues/5 . Another benefit is
that direct SQL queries are much faster than rtracklayer::getTable().

SIGNIFICANT USER-VISIBLE CHANGES

The GRanges object returned by mapToTranscripts() or
pmapToTranscripts() takes the transcript lengths as seqlengths.

pmapToTranscripts() always takes the transcript name as the seqname,
even when there is no overlap. Before, it used “UNMAPPED” as the
seqname when there was no overlap.

BUG FIXES

Fix bug where ‘coverageByTranscript(x, transcripts)’ was erroring in
situations where ‘transcripts’ contains transcripts with an exon that
receives no coverage and is located on a sequence for which the
seqlength is not available in ‘seqinfo(x)’ nor in
‘seqinfo(transcripts)’.

2 improvements to the “promoters” method for GenomicRanges objects: -
The ‘upstream’ and ‘downstream’ arguments now can be integer vectors
parallel to ‘x’, - The ‘use.names’ argument now is supported. This is
for consistency with the other intra range transformations.

SIGNIFICANT USER-VISIBLE CHANGES

GenomicRanges now is a List subclass. This means that GRanges objects
and their derivatives are now considered list-like objects (even
though [[ don’t work on them yet, this will be implemented in
Bioconductor 3.8).

Add the CompressedGRangesList class as a replacement for the
GRangesList class. The long term goal is that GRangesList becomes a
virtual class with CompressedGRangesList as a concrete subclass. Note
that the GRangesList() constructor now returns a
CompressedGRangesList instance instead of a GRangesList instance.

GenomicRangesList is now a virtual class (like IntegerRangesList is).

GRanges derivatives no longer support the ‘x[i, j] <- value’ form of
subassignment. This feature was of very limited usefulness and no
Bioconductor package was using it.

Improve performance of nearest(), precede(), and follow() on a
GRanges object.

Improve performance of coverage() on a GPos object.

Improve performance of sort() on a GRangesList object. Also now it
supports ‘ignore.strand’. See
https://github.com/Bioconductor/GenomicRanges/issues/1 (and note how
unnicely these changes were requested).

Improve performance and error handling of coercion from RleList to
GRanges. This is a 50x speedup or more when the RleList object to
coerce has thousands of list elements or more.

BUG FIXES

Fix coercion from RleList to GRanges when some list elements in the
object to coerce have length 0 (see
https://support.bioconductor.org/p/105926/ for original report by
Xiaotong Yao).

Fix bug in nearest() when an unstranded range in ‘query’ precedes or
follows more than one range in ‘subject’.

The function ‘scores()’ has been deprecated and replaced by the
function ‘gscores()’.

The argument ‘scores.only’ in the function ‘scores()’ has been
deprecated and replaced by calling the function ‘score()’.

The ‘MafDb’ class has been deprecated and now the ‘GScores’ class
supports former ‘MafDb’ objects. The ‘mafByOverlaps()’ and
‘mafById()’ functions have been deprecated and replaced by the
function ‘gscores()’. The ‘populations()’ function from the ‘MafDb’
API has been integrated into the ‘GScores’ API.

Added metadata on genomic scores groups, available through the
function ‘gscoresGroups()’, on availability of non-single nucleotide
regions through the function ‘gscoresNonSNRs()’, and on the default
population used through the function ‘defaultPopulation()’.

New AnnotationHub resources have been added during this release
cycle: phyloP60way.UCSC.mm10, LINSIGHT, phastCons46wayPlacental,
phastcons46wayPrimates.

Added a BiocSticker at
https://github.com/Bioconductor/BiocStickers/tree/master/GenomicScores

Added citation information after package publication has been
accepted at Bioinformatics.

Add the windows() generic with various methods. This is a “parallel”
version of window() for list-like objects i.e. it does
‘mendoapply(window, x, start, end, width)’ but uses a fast
implementation. Also add heads() and tails() as convenience wrappers
around windows(). They do ‘mendoapply(head, x, n)’ and
‘mendoapply(tail, x, n)’, respectively, but use a fast
implementation. They’re replacements for S4Vectors::phead() and
S4Vectors::ptail() which are now deprecated.

Add equisplit() to split a vector-like object into a specified number
of partitions with equal (total) width. This is useful for instance
to ensure balanced loading of workers in parallel evaluation.

promoters() arguments ‘upstream’ and ‘downstream’ now can be integer
vectors parallel to ‘x’ (for consistency with the other intra range
transformations).

The promoters() generic and methods get the ‘use.names’ argument.

Add “resize”, “flank”, and “restrict” methods for Views objects.

Add “as.integer” method for Pos objects (equivalent to pos()).

SIGNIFICANT USER-VISIBLE CHANGES

The Ranges virtual class is now the common parent of the IRanges,
GRanges, and GAlignments classes (GRanges and GAlignments are defined
in the GenomicRanges and GenomicAlignments packages, respectively).
More precisely, Ranges is a virtual class that now serves as the
parent class for any class that represents a vector of ranges. The
ranges can be integer ranges (i.e. ranges on the space of integers)
like in an IRanges object, or genomic ranges (i.e. ranges on a
genome) like in a GRanges object. Note that because Ranges extends
List, all Ranges derivatives are considered list-like objects. This
means that GRanges objects and their derivatives are considered
list-like objects, which is new (even though [[ don’t work on them
yet, this will be implemented in Bioconductor 3.8).

Similarly the RangesList virtual class is now the common parent of
the IRangesList, GRangesList, and GAlignmentsList classes.

IRanges objects don’t support [[, unlist(), as.list(), lapply(), and
as.integer() anymore. This is a temporary situation only. These
operations will be re-introduced in Bioconductor 3.8 but with a
different semantic. The overall goal of all these changes is to bring
more consitency between IRanges and GRanges objects (GRanges objects
will also support [[, unlist(), as.list(), and lapply() in
Bioconductor 3.8). Non-exported IRanges:::unlist_as_integer() helper
is a temporary replacement for what unlist() and as.integer() used to
do a IRanges object.

Move the pos() generic to BiocGenerics.

Switch order of breakInChunks() arguments ‘chunksize’ and ‘nchunk’ to
be consistent with tileGenome().

tile() and slidingWindows() now preserve names.

Optimize [[<- on a CompressedList object. Was very inefficient. The
optimized method can be up to 100x faster or more on a long object.

All the S4Vectors-specific material in the IRangesOverview.Rnw
vignette has moved to the new S4VectorsOverview.Rnw vignette located
in the S4Vectors package.

DEPRECATED AND DEFUNCT

Deprecate the RangesList() constructor. IRangesList() should be used
instead.

The “ranges” methods for Hits and HitsList objects are now defunct
(were deprecated in BioC 3.6).

The “overlapsAny”, “subsetByOverlaps”, “coverage” and “range” methods
for RangedData objects are now defunct (were deprecated in BioC 3.6).

The universe() getter and setter as well as the ‘universe’ argument
of the RangesList(), IRangesList(), RleViewsList(), and RangedData()
constructor functions are now defunct (were deprecated in BioC 3.6).

deprecated the two functions - logomaker and nlogomaker for
standard and EDLogo. All logo plots can be now be generated using
the same function - logomaker(). The type argument in this function
can be chosen to be Logo or EDLogo.

trimmed the package down from nearly 60 exported functions to just 7
exported functions.

The format of the input data is now made more flexible - it allows
for a vector of character sequences, along with the PFM or the PWM
matrix as before (see vignette).

changed the complicated color_profile argument into three separate
arguments - a color_type similar to color_profile$type argument
before, a colors argument allowing user to choose a cohort of
colors, and a color_seed argument allowing the user to sample
different colors from the cohort. We now provide a default cohort of
colors as well as default color_type in per-row (see vignette).
The user now can do with not worrying about defining color_profile
at all, and use the defaults instead and change the default cohort by
color_seed. (see vignette).

added a return_heights option in logomaker() function that, when
set to TRUE, returns the information of the heights of the stacks
used for both standard and EDLogo (see vignette).

added a use_dash argument that, when set to TRUE, would
automatically detect if the input is a character sequence of PFM
matrix and perform adaptive scaling of heights (see vignette).

updated the vignette completely with major focus on the EDLogo
representation and the use of the current logomaker() functionality

updated the README - with citation information and a demo example
added.

Updated the gallery codes
(https://kkdey.github.io/Logolas-pages/Gallery.html) here to conform
to the new system of functions.

Updated the HTML vignette
(https://kkdey.github.io/Logolas-pages/workflow.html) to match with
the pdf version of the vignette attached with the package.

Added preliminary support for DelayedArray-backed minfi objects. This
allows disk-backed minfi objects (e.g., using HDF5). This
functionality is currently recommended only for developers and
advanced users. A user-friendly interface is currently in
development. All existing minfi functionality and serialized objects
should continue to work as it did in versions prior to 1.25. Please
report any problems to the GitHub issue tracker.

Fixing bug in functions readGEORawFile() and
getGenomicRatioSetFromGEO(). These two functions did not work
(reported an error). They should work now. Thanks to users who
reported problems at GitHub issues.

knn-based density peak clustering is not general for all datasets.
Rolled back to the previous densityPeak clustering algorithm and set
it to be the default algorithm. A new Louvain clustering algorithm
for dealing with large datasets (> 50 k cells) is added.

Add parameter timeDomain to combineSpectra,
combineSpectraMovingWindow and estimateMzScattering allowing to
perform the grouping of m/z values from consecutive scans based on
the square root of the m/z values <2018-03-29>.

Version: 3.6
Category: Bugfixes
Text: To determine the transcriptional strand of mutations in genes,
all mutations that overlap with multiple genes were excluded.
When these genes are on different strands, it can indeed not be
determined whether a mutation is on the transcribed or
untransribed strand. However, if these overlapping genes are
all on the same strand, the transcriptional strand can be
determined, but these were unneccesarily removed from the
analysis. This bug is now fixed, and as a result more mutations
are now included in the analysis. This bugfix influences the
results of: ‘mut_strand’ (previously ‘strand_from_vcf’) and
‘mut_matrix_stranded’

Version: 3.6
Category: New features & parameter changes
Text: Replicative strand bias analyses - ‘mut_strand’ and
‘mut_matrix_stranded’ can now be executed in two modes:
‘transcription’ (default) or ‘replication’ - All downstream
analyses can be performed for both modes with
‘strand_occurrences’, ‘strand_bias_test’ and ‘plot_strand_bias’

Version: 3.6
Category: Interface changes
Text: ‘read_vcfs_as_granges’: The ‘genome’ parameter must now be the
name of a BSgenome library, to prevent problems with seqlevels
style. The function now accepts an optional ‘group’ parameter
to use a subset of chromosomes. It also accepts the new
optional ‘check_alleles’ parameter to significantly speed up
the reading of VCF files.

first release of ORFik - find Open Reading Frames, automatic RiboSeq
footprint shifts, reassignment of Transcription Start Sites with the
use of CageSeq, plethora of gene identity functions from scientific
publications

As the “getTable()” of rtracklayer would produce a Bad Request error.
We temporarily disabled the code checking of the
“PrepareAnnotationRefseq2” function until the bug had been fixed.
Please use the “PrepareAnnotationEnsembl2” to prepare the annotation
file instead of “PrepareAnnotationRefseq2”.

A bug in the string-based filtering tool was fixed. The case where an
entity could be both contaminants and reverse was not takien into
account.This lead to wrong number in the plot.

Correction of the beahviour of the table in the experimental design
(convert Data tool). When the user copy-paste some lines it may add
unneeded rows. These rows can be deleted with an option in the
contextual menu.

Bugfix in the ‘rUGgmm()’ function when called with an undirected
graph defined by numeric (integer) vertices, to respect the numeric
ordering of those vertex lables, i.e., avoid simulating a graph with
ordered vertices “1”, “10”, “11”, etc., and get instead “1”, “2”,
“3”, etc.

Main update: Re-formatted ranking databases. They are now loaded
from .feather files (and therefore, they are transposed: motifs are
stored as rows, and genes/regions as columns, which allows to load
only specific genes/regions)

Change some examples to dontrun and improve the code that cleans up
after the tests. This should reduce the size of files left in tmp
although they didn’t seem too big to begin with.

Changes in version 1.5.9:

SIGNIFICANT USER-VISIBLE CHANGES

The functions add_metadata() and add_predictions() now return the
sample metadata or predictions when the ‘rse’ argument is missing.

Changes in version 1.5.6:

NEW FEATURES

Added the function add_metadata() which can be used to append curated
metadata to a recount rse object. Currently, add_metadata() only
supports the recount_brain_v1 data available at
http://lieberinstitute.github.io/recount-brain/ and to be further
described in Razmara et al, in prep, 2018.

Fix a unit test for download_study(), add another test for the
versions, and fix a NOTE in R CMD check.

Changes in version 1.5.3:

NEW FEATURES

download_study() can now download the transcript counts
(rse_tx.RData) files. The transcript estimation is described in Fu et
al, 2018.

SIGNIFICANT USER-VISIBLE CHANGES

download_study() now has a version parameter (defaults to 2). This
argument controls which version of the files to download based on the
change on how exons were defined. Version 1 are reduced exons while
version 2 are disjoint exons as described in further detail in the
documentation tab of the recount website
https://jhubiostatistics.shinyapps.io/recount/.

recount_url and the example rse_gene_SRP009615 have been updated to
match the changes in version 2.

The package gets a new vignette: S4VectorsOverview.Rnw The material
in this new vignette comes from the IRangesOverview.Rnw vignette
located in the IRanges package. All the S4Vectors-specific material
was moved from the IRangesOverview.Rnw vignette to the new
S4VectorsOverview.Rnw vignette.

All Vector derivatives now support ‘x[i, j]’ by default. This allows
the user to conveniently subset the metadata columns thru ‘j’. Note
that GenomicRanges objects have been supporting this feature for
years but now all Vector derivatives support it. Developers of Vector
derivatives with a true 2-D semantic (e.g. SummarizedExperiment) need
to overwrite this.

rank() now suports ‘by’ on Vector derivatives.

Add concatenateObjects() generic and methods for LLint, vector,
Vector, Hits, and Rle objects. This is a low-level generic intended
to facilitate implementation of c() on vector-like objects. The
“concatenateObjects” method for Vector objects concatenates the
objects by concatenating all their parallel slots. The method behaves
like an endomorphism with respect to its first argument ‘x’. Note
that this method will work out-of-the-box and do the right thing on
most Vector subclasses as long as parallelSlotNames() reports the
names of all the parallel slots on objects of the subclass (some
Vector subclasses might require a “parallelSlotNames” method for this
to happen). For those Vector subclasses on which concatenateObjects()
does not work out-of-the-box or does not do the right thing, it is
strongly advised to override the method for Vector objects rather
than trying to override the (new) “c” method for Vector objects with
a specialized method. The specialized “concatenateObjects” method
will typically delegate to the method below via the use of
callNextMethod(). See “concatenateObjects” methods for Hits and Rle
objects for some examples. No Vector subclass should need to override
the “c” method for Vector objects.

Major refactoring of [[<- for List objects. It’s now based on a new
“setListElement” method for List objects that relies on [<- for
replacement, c() for appending, and [ for removal, which are the 3
operations that setListElement() can perform (depending on how it’s
called). As a consequence [[<- now works out-of-the box on any List
derivative for which [<-, c(), and [ work.

SIGNIFICANT USER-VISIBLE CHANGES

endoapply() and mendoapply() are now regular functions instead of
generic functions.

A couple of minor improvements to how default “showAsCell” method
handles list-like and non-list like objects.

Replace strsplitAsListOfIntegerVectors() with
toListOfIntegerVectors(). (The former is still available but
deprecated in favor of the latter.) The input of
toListOfIntegerVectors() now can be a list of raw vectors (in
addition to be a character vector), in which case it’s treated like
if it was ‘sapply(x, rawToChar)’.

A couple of optimizations to “[<-“ method for DataFrame objects (see
commit e63f4cfd637e3471e4b04015c2938348df17e14a).

DEPRECATED AND DEFUNCT

phead() and ptail() are deprecated in favor of IRanges::heads() and
IRanges::tails().

strsplitAsListOfIntegerVectors() is deprecated in favor of
toListOfIntegerVectors().

BUG FIXES

The mcols() setter no more tries to downgrade to DataFrame a supplied
right value that extends DataFrame (e.g. DelayedDataFrame).

‘DataFrame(I(x)) and as(I(x), “DataFrame”)’ now drops the I()
wrapping before storing ‘x’ in the returned object. This wrapping was
ugly, not needed, and breaking S4 objects.

Fix a couple of long-standing bugs in DataFrame subassignment: - Bug
in the “[<-“ method for DataFrame objects where replacing the 1st
variable with a rectangular object (e.g. x1 <-
DataFrame(aa=I(matrix(1:6, ncol=2)))) was returning a DataFrame with
the “nrows” slot set incorrectly. - A couple of bugs in the
“replaceROWS” method for DataFrame objects when used in “rbind mode”
i.e. when max(i) > nrow(x).

Fix bug in “cbind” method for DataFrame where it was appending X to
the column names in some situations (see
https://github.com/Bioconductor/S4Vectors/issues/8).

Modified decomposeVar() to return statistics (but not p-values) for
spike-ins when get.spikes=NA. Added block= argument for mean/variance
calculations within each level of a blocking factor, followed by
reporting of weighted averages (using Fisher’s method for p-values).
Automatically record global statistics in the metadata of the output
for use in combineVar(). Switched output to a DataFrame object for
consistency with other functions.

Fixed testVar() to report a p-value of 1 when both the observed and
null variances are zero.

Allowed passing of arguments to irlba() in denoisePCA() to assist
convergence. Reported low-rank approximations for all genes,
regardless of whether they were used in the SVD. Deprecated design=
argument in favour of manual external correction of confounding
effects. Supported use of a vector or DataFrame in technical= instead
of a function.

Allowed passing of arguments to prcomp_irlba() in buildSNNGraph() to
assist convergence. Allowed passing of arguments to get.knn(),
switched default algorithm back to a kd-tree.

Added the buildKNNGraph() function to construct a simple
k-nearest-neighbours graph.

Fixed a number of bugs in mnnCorrect(), migrated code to C++ and
parallelized functions. Added variance shift adjustment, calculation
of angles with the biological subspace.

Modified trend specification arguments in trendVar() for greater
flexibility. Switched from ns() to robustSmoothSpline() to avoid bugs
with unloaded predict.ns(). Added block= argument for mean/variance
calculations within each level of a blocking factor.

Added option to avoid normalization in the SingleCellExperiment
method for improvedCV2(). Switched from ns() to smooth.spline() or
robustSmoothSpline() to avoid bugs.

Replaced zoo functions with runmed() for calculating the median trend
in DM().

Added block= argument to correlatePairs() to calculate correlations
within each level of a blocking factor. Deprecated the use of
residuals=FALSE for one-way layouts in design=. Preserve input order
of paired genes in the gene1/gene2 output when pairings!=NULL.

Added block= argument to overlapExprs() to calculate overlaps within
each level of a blocking factor. Deprecated the use of
residuals=FALSE for one-way layouts in design=. Switched to automatic
ranking of genes based on ability to discriminate between groups.
Added rank.type= and direction= arguments to control ranking of
genes.

Modified combineVar() so that it is aware of the global stats
recorded in decomposeVar(). Absence of global statistics in the input
DataFrames now results in an error. Added option to method= to use
Stouffer’s method with residual d.f.-weighted Z-scores. Added
weighted= argument to allow weighting to be turned off for equal
batch representation.

Modified the behaviour of min.mean= in computeSumFactors() when
clusters!=NULL. Abundance filtering is now performed within each
cluster and for pairs of clusters, rather than globally.

Switched to pairwise t-tests in findMarkers(), rather than fitting a
global linear model. Added block= argument for within-block t-tests,
the results of which are combined across blocks via Stouffer’s
method. Added lfc= argument for testing against a log-fold change
threshold. Added log.p= argument to return log-transformed
p-values/FDRs. Removed empirical Bayes shrinkage as well as the
min.mean= argument.

Added the makeTechTrend() function for generating a mean-variance
trend under Poisson technical noise.

Added the multiBlockVar() function for convenient fitting of multiple
mean-variance trends per level of a blocking factor.

Added the clusterModularity() function for assessing the cluster-wise
modularity after graph-based clustering.

Added the parallelPCA() function for performing parallel analysis to
choose the number of PCs.

rowRanges() now is supported on a SummarizedExperiment object that is
not a RangedSummarizedExperiment, and returns NULL. Also doing
‘rowRanges(x) <- NULL’ on a RangedSummarizedExperiment object now is
supported and degrades it to a SummarizedExperiment instance.

saveHDF5SummarizedExperiment() and loadHDF5SummarizedExperiment() are
now in the HDF5Array package.

Replace old “updateObject” method for SummarizedExperiment objects
with a new one. The new method calls updateObject() on all the assays
of the object. This will update SummarizedExperiment objects (and
their derivatives like BSseq objects) that have “old” DelayedArray
objects in their assays. The old method has been around since BioC
3.2 (released 2.5 years ago) and was used to update objects made
prior to the change of internals that happened between BioC 3.1 and
BioC 3.2. All these “old” objects should have been updated by now so
we don’t need this anymore.

BUG FIXES

Modify the “[<-“ method for SummarizedExperiment to leave
‘metadata(x)’ intact instead of trying to combine it with
‘metadata(value)’. With this change ‘x[i , j] <- x[i , j]’ behaves
like a no-op (as expected) instead of duplicating metadata(x).

The SummarizedExperiment() constructor does not try to downgrade the
supplied rowData and/or colData to DataFrame anymore if they derive
from DataFrame.

Parse “spectrumId” column of the mzML header to find the scan number
(instead of the “acquisitionNum”) because ProteomDiscover generates
non-standard “spectrumId” and proteowizard fails to translated it
into a valid “acquisitionNum”. See #73 for details [2018-02-22].

Allow the user to decide how to handle redundant fragment matching.
Current default is redundantFragmentMatch="remove" and
redundantIonMatch="remove". This will reduce the number of fragment
matches. Choose "closest" for both to get the old behaviour. See
also #72 [2018-01-29].

ch2locs (retrievable via dsQTL::getSNPlocs) has been changed at about
1850 locations where rs numbers had been associated with hg19
addresses; the dsQTL regions are hg18 as are all the chr2… SNP
addresses. Previously the discoverable rs numbers used in the
Chicago distribution from
http://eqtl.uchicago.edu/dsQTL_data/GENOTYPES/ had be mapped via
SNPlocs…20111119, but now they come directly from the Chicago text
file.

The dataset TargetSearchData which used to contain TargetSearch
objects has been moved from package TargetSearchData to package
TargetSearch. Note that the dataset name has been renamed to
TSExample. To load it use data(TSExample) within TargetSearch.