01 January 2011

The Variant Call Format is a text file format generated by many tools for NGS. It contains meta-information lines, a header line, and then data lines describing how the mutations were called. I don't like this format because it cannot be used to store some hierachical annotations (like json or xml), nevertheless it is a de facto standard.

I wrote a tool called vcfannotator to append a set of annotations from the UCSC database to a VCF file. As I wanted to keep this tool simple and without any dependencies, it only uses the flat files available from the download area at the UCSC.

This tools appends several informations:

A prediction of the mutation: is it in the cDNA ? in an intron ? in an exon ? is it a non-synonymous mutation ? was a stop codon lost or gained ? is there any consequence on the splicing process ? Here, the UCSC DAS server and the KnonwGenes table are used to retrieve the genomic DNA and the structure of the gene.

Interestingly, for this last mutation chr1:113068276, there are two transcripts at the same position with two different translation frames, so there are two predictions: one synonymous mutation and one non-synonymous mutation.

Seems like a very useful tool!However, when I try to run it, I get the following error:java.io.IOException: illegal number of columns in at sandbox.VCFFile.read(VCFAnnotator.java:521) at sandbox.VCFFile.parse(VCFAnnotator.java:576) at sandbox.VCFAnnotator.main(VCFAnnotator.java:2778)

My VCF-file is an output of GATK, so I think the format should be fine. I already tried with a few lines which are the same as your example. But it still keeps throwing me this error. It might be a stupid question, but do you know what I'm doing wrong? Thanks!

Dear Pierre, Sorry for my question yesterday. Very stupid indeed. Apparently my VCF-file contains an empty row at the bottom of the file and this is the reason why vcfannotator wouldn't work. It works fine now. Thanks for this great tool!