Running RSeq Workflow

This part will map the fastq/fasta RNA-Seq data to genome and transcriptome, then produce QC report of sequencing data, SNP calling results, wig file for Genome browser, expression level and reads counts for gene/exon/splicing junction.

Workflow Input Files
The RSeq workflow requires 5 inputs which are as follows

FastQ File

Bustard Summary File

Precomputed PerM OR Bowtie index files –
Note: a. If you plan to use PerM mapper you only need to download PerM index files.
b. You need not download all index files. Download only index files for the Gender and Read Length corresponding to your input sample.

Genome FA file.
Note: You need not download all FA files. Download only the FA file for the Gender corresponding to your input sample.

Obtaining Input File(s)
Sample FastQ/index files are provided here. Additionally, you may use your own FastQ files as input.

To use any input file with the RSeq workflow you need to download the files into the Virtual Machine.

Steps to run the RSeq Workflow
The RSeq workflow currently only supports samples with read-length of 50, 75, and 100. Kindly ensure the read length of your sample files is supported.
The explanation assumes that a user is trying to use their own sample with the workflow. The name of the sample FastQ file my-own-fastq.txt nad name of the bustard summary file is my-summary.htm.
Additionally the sample has the following attributes:

Flowcell ID: FLOW8

Sample ID: SAMP_5

Lane Number: 4

Read Length: 100

Gender: Female

Place your input files in the folder that you have shared with the Virtual Machine

Downloading index and FA files
The RSeq workflow requires a number other files such as precomputed index files, and FA files. To simplify the process of downloading, decompressing, and registering these files we have provided a tool. This tool will download all required index and FA files based on the sample attributes specified.PerM: If you want to use PerM mapper with the workflow.

Flowcell – Is a Flowcell ID
Sample – Is a Sample ID
Lane-Number – Lane number of the sample
Gender – Gender of the sample (M/F)
Read-Length – Read ength with which to run the workflow (50/75/100)
Mismatch-Count – Number of mismatches allowed
Parts – Number of parts in which to split mapping job (For VM we recommend setting this to 1)
Illumina-Summary-File-Extension – Valid values .htm or .xml
Bowtie – Valid values Y (Use Bowtie Mapper) or N (Use PerM mapper)
NOTE: Flowcell Sample-ID and Lane-Number should be the same as the ones provided while registering the sample input
NOTE: Read Length should be the same as the one provided while registering the PerM index