Some of the key parameters to the algorithm are described in detail in our publication.

Argument

Description

Suggested Value

matlab_dir

The directory where Matlab or MCR is installed.

sample_name

The name of the sample, used for setting the output file names.

Can be any string valid for a file name

rearrangement_predictions

Input text file with the predicted rearrangements estimation (by PEM). It is a tab delimited text file with a header row, containing 'chr1', 'chr2', 'pos1', 'pos2', 'str1','str2' columns which represent the predicated rearrangement (chromosome, position, and strand for the two breakpoints) and a ‘tumreads’ column which represent how many discordant read pairs support the prediction.

bam_file

Is the BAM file of the sample. The folder containing it should also contain an index.

lane_black_list

Is a text file listing lanes that you wish to omit. If you don't have any you can specify "none" (suggested).

none

refdir

Is a directory containing the genome by text files named chr1.txt, chr2.txt, chr3.txt, ......, chrX.txt with the content of each chromosome as one single line.

insertionsize

Average insertion size, or a range of likely insertion sizes (in bp).

400

confidence_thres

The number of supporting discordant pairs reads above which the highconfidence parameters (below) are used.

6

low_confidence_sidewithread

The estimated error in the loci prediction before the breakpoint, that is the side the supporting reads are on, when number of supporting discordant pairs is low.

80

low_confidence_sidewithoutread

The estimated error in the loci prediction after the breakpoint, when number of supporting discordant pairs is low.

200

high_confidence_sidewithread

The estimated error in the loci prediction before the breakpoint, that is the side the supporting reads are on, when number of supporting discordant pairs is high.

40

high_confidence_sidewithoutread

The estimated error in the loci prediction after the breakpoint, when number of supporting discordant pairs is high.

50

max_mismatches

The maximal percentage of mismatches allowed for a read to be considered partly aligned (and hence possibly spanning the breakpoint).

30 (for MAQ), 80 (for BWA)

tipsize

The size (in bp) of the tip of the read to check partial alignment.

7 (for MAQ), 15 (for BWA)

min_mismatches_in_tip

The minimal number of mismatches required in a tip so it will be considered a mismatched tip, and hence the read is a candidate for partly aligned read (and hence possibly spanning the breakpoint).

2 (for MAQ), 5 (for BWA)

max_N

The maximal number of N allowed in a read for it to be considered partly aligned (and hence possibly spanning the breakpoint).

10

expand_pairs_extraction

Window size around predicted breakpoint to look for split reads.

500

max_reads_fished

Maximal number of reads to pull when looking for split reads.

100000

readlen

The read length (in bp).

align_enough_reads

BreakPointer quits aligning after this number of split reads that support on the same breakpoint are found.

20

split_penalty

The penalty to BreakPointer score for splitting a read.

8

min_qual

The minimal score a split read needs to be considered (normalized by read length).

0.75

libdir

The path to a directory with GrabSplitReads.jar, align_each_bkpt5.bin, and links to BWA and Samtools.