predictStrand-methods: predictStrand

Description

The function evaluates transcription initiation within a peak region by
comparing RNA-seq read densities upstream and downstream of an empirically
determined transcription start sites. Putative transcription of both forward
and reverse genomic strands is tested and the results are stored with each
ChIP-seq peak.

Arguments

cdsObj

A ChipDataSet object.

tdsObj

A TranscriptionDataSet object.

coverage.cutoff

Numeric. A cutoff value to discard regions with
the low fragments coverage, representing expression noise. By default,
the value stored in the coverageCutoff slot of the supplied
TranscriptionDataSet object is used. The optimal cutoff value can
be calculated by estimateBackground function call.

quant.cutoff

Numeric. A cutoff value for the cumulative
distribution of the RNA-seq signal along the ChIP-seq peak region. Must
be in a range (0, 1). For the details, see step 1 in the "Details"
section below. Default: 0.1.

win.size

Numeric. The size of the q1 and q2 regions
flanking transcription start position at the 5' and 3', respectively.
For the details, see step 2 in the "Details" section below.
Default: 2500.

prob.cutoff

Numeric. A cutoff value for the probability of reads
to be sampled from the q2 flanking region. If not supplied, the value
estimated from the data will be used. Must be in a range (0, 1). For the
details, see step 6 in the "Details" section below.

Details

RNA-seq data is incorporated to find direct evidence of active
transcription from every putatively gene associated peak. In order to do
this, we determine the 'strandedness' of the ChIP-seq peaks, using strand
specific RNA-seq data. The following assumptions are made in order to
retrieve the peak 'strandedness':

This transcription initiation occurs within the ChIP peak region.

When a ChIP peak is associated with a transcription initiation
event, we expect to see a strand-specific increase in RNA-seq
fragment count downstream the transcription initiation site.

Each peak in the data set is tested for association with transcription
initiation on both strands of DNA. Steps 1-5 are performed for both
forward and reverse DNA strand separately and step 6 combines the data
from both strands. If the peak is identified as associated with the
transcription on both strands, than it is considered to be a bidirectional.

ChIP peak 'strandedness' prediction steps:

Identify a location within the ChIP-seq peak near the
transcription start site. This is accomplished by calculating the
cumulative distribution of RNA-seq fragments within a peak region.
The position is determined where 100% - 'quant.cutoff' * 100% of
RNA-seq fragments are located downstream. This approach performs well
on both gene-poor and gene-dense regions where transcripts may overlap.

Two equally sized regions are defined (q1 and q2), flanking the
position identified in (1) on both sides. RNA-seq fragments are
counted in each region.

ChIP peaks with an RNA-seq fragment coverage below an estimated
threshold are discarded from the analysis.

The probability is calculated for RNA-seq fragments to be
sampled from either q1 or q2. Based on the assumptions we stated
above, a ChIP peak that is associated with transcription initiation
should have more reads in q2 (downstream of the transcription start
position) compared to q1, and subsequently, the probability of a
fragment being sampled from q2 would be higher.

ChIP-seq peaks are divided into gene associated and background
based on the prediction.

Iteratively, the optimal P(q2) threshold is identified, which
balances out the False Discovery Rate (FDR) and False Negative Rate
(FNR). Peaks with the P(q2) exceeding the estimated threshold are
considered to be associated with the transcription initiation event.

Value

The slot strandPrediction of the provided
ChipDataSet object will be updated by the the following
elements: 'predicted.strand', 'probability.cutoff', 'results.plus' and
'results.minus'.