The first task is to create a config file (chanjo.yaml) and prepare the database. Chanjo will walk you through setting it up by running:

$ chanjo init
$ chanjo db setup

Note

Chanjo uses project-level config files by default. This means that it will look for a chanjo.yaml file in the current directory where you execute your command. You can also point to a diffrent config file using the chanjo-c/path/to/chanjo.yaml option.

One important thing to note is that Chanjo considers coverage across exonic regions of the genome. It’s perfectly possible to compose your own list of intervals. Just make sure to follow the BED conventions (http://genome.ucsc.edu/FAQ/FAQformat.html#format1). You then add a couple of additional columns that define relationships between exons and transcripts and transcripts and genes:

If an exon belongs to multiple transcripts you define a list of ids and an equal number of gene identifiers to match.

Let’s tell Chanjo which exons belong to which transcripts and which transcripts belong to which genes. It’s fine to use the output from Sambamba as long as the two columns after “strand” are present in the file.

The SQL schema has been designed to be a powerful tool on it’s own for studying coverage. It let’s you quickly aggregate metrics across multiple samples and can be used as a general coverage API for accompanying tools.

One example of such a tool is Chanjo-Report, a coverage report generator for Chanjo output. A report could look something like this (click for the full PDF):