Latest News

What's the News?

Manual 0.7

0. Before you start, you should know

The pipeline is optimized by mtDNA sequencing data, thus there may be some problems when running it with other types of data (starting position on the reference is not 1;or too much data), we would likt to work with you to solve all the problems you met.

Indels are always assigned to the forward strand, thus became strand-specific, we will solve this problem in a later version

code in the dotted box is an example using the Poisson method

I am particularly interested in detecting low-level mutations in tumor tissues, let me know if you want to initiate a callaboration. I am also looking for a postdoc now.

1.Generating pileup file from sorted SAM file

To build a pileup file, you need CRISP package CRISP Website (in case they shut down the webpage sometimes, you can contact its author or me)

All the ssp files should be saved in one folder, and with a specific suffix

perl error_profile_pois.pl-d folder including all population data(in ssp format)-s suffix of the ssp files-i length of the read bins<10bp>-r length of the reads

perl error_profile_pois.pl -d . -s ssp -i 20 -r 76

The present folder should include all the ssp files: test.ssp, test1.ssp, test2.ssp ..., they should have the same suffix specified by [-s]
after running this scrip, you should have error_pois.index and error_pois.position under the same folder

All the ssp files should be saved in one folder, and with a specific suffix.

perl error_profile_emp.pl-d folder including all population data(in ssp format)-s suffix of the ssp files-i length of the read bins<10bp>-m minimum number of samples,otherwise,combine nucleotides-t minimum coverage in each bin

After running this scrip, you should have error_emp.index under the same folder-t: the bin whose coverage is lower than that defined by <-t> won't be included in the reference error database-m: if the number of error rates (reference sample size) in the bin is lower than that defined by <-m>, different nucleotides will be merged if the T-test p-value>0.005
If the number of error rates in the bin is lower than 10, the sample will be skipped as error rates could be highly biased
If the number of error rates in the bin is >50 and lower than <-m>, psudo error rate would be generated assuming a normal distribution

This could result in more conserved result unless the error rate is higher than the one you specified, this option could be used to scan common variations

4.1. With empirical distribution

perl Dreep_emp.pl-i ssp file-r error profile file (error_emp.index)-l length of the read bins(10bp)-m minimum number of error rate entries in error_emp.index,otherwise,merge different nucleotides at the same position-s lowest p_value for each bin(the empirical p_value depends on the sample size, <-s> is used to define the lower bound 0.01~0.001 would be fine). -b output Bias instead of Phred-scaled Quality Score

An output with suffix of pois.log/emp.log will be generated for each sample