BaseRecalibrator Error

When running BaseRecalibrator with own selected SNPs, i got following stack trace error:

ERROR stack trace

java.lang.ArrayIndexOutOfBoundsException: -1
at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.isLowQualityBase(BaseRecalibrator.java:205)
at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:228)
at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:106)
at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:65)
at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:18)
at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:62)
at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:265)
at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146)
at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:93)

Best Answer

Unfortunately I wasn't able to run the GATK on your sam file (I think it is missing its header?), but I was able to fix another problem in the BaseRecalibrator related to your reads. Hopefully this will fix your issue. Version 2.1-13 should appear on the website later today.

Thank you for the fast answer, I changed to the latest version, but the error unfortunately remained the same.

ERROR stack trace

java.lang.ArrayIndexOutOfBoundsException: -1
at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.isLowQualityBase(BaseRecalibrator.java:205)
at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:228)
at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:106)
at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:65)
at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:18)
at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:62)
at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:265)
at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146)
at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:93)

Output ended at Scf2113 but i assume the error to be in the following scaffs.
I created a sam file of the scaffolds 2113 to 2123 using samtools view. Hope that`s what you are looking for.
Thank you for testing!

Thank you for your effort! I rerun the files with the newest version. The error persists in the one file i sent you (same error, same place):

ERROR stack trace

java.lang.ArrayIndexOutOfBoundsException: -1
at org.broadinstitute.sting.utils.recalibration.ReadCovariates.getKeySet(ReadCovariates.java:31)
at org.broadinstitute.sting.gatk.walkers.bqsr.AdvancedRecalibrationEngine.updateDataForPileupElement(AdvancedRecalibrationEngine.java:71)
at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:244)
at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:106)
at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:65)
at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:18)
at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:62)
at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:265)
at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146)
at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:93)

ERROR A GATK RUNTIME ERROR has occurred (version 2.1-12-ga99c19d):

but 3 other alignment-files worked well. So i am starting to suggest its a error due to the file, not the program. Unfortunately, i need the error-ed file to create a reference-SNP-set. Do you have other suggestions for the cause of the error?

Unfortunately I wasn't able to run the GATK on your sam file (I think it is missing its header?), but I was able to fix another problem in the BaseRecalibrator related to your reads. Hopefully this will fix your issue. Version 2.1-13 should appear on the website later today.

I encounter a similar question as what mentioned in this thread. But I cann't find a solution in this thread.
When use BaseRecalibrator based on a vcf file, produced from my own bam file, it showed a "stack trace" error (full error message shown blow). Since my study species is non-model species, so I do not have known SNP site data, and thus have to repeatedly do UnifiedGenotyper-BaseRecaibartor-PrintReads from the original bam file; it was fine when I did the same BaseRecalibrator at the first round but failed at the second round. The GATK version is 2.3-0. I encountered the same problem in two different bam files.

error message occur:

ERROR stack trace

java.lang.ArrayIndexOutOfBoundsException: -4
at org.broadinstitute.sting.utils.baq.BAQ.calcEpsilon(BAQ.java:158)
at org.broadinstitute.sting.utils.baq.BAQ.hmm_glocal(BAQ.java:225)
at org.broadinstitute.sting.utils.baq.BAQ.calcBAQFromHMM(BAQ.java:542)
at org.broadinstitute.sting.utils.baq.BAQ.calcBAQFromHMM(BAQ.java:595)
at org.broadinstitute.sting.utils.baq.BAQ.calcBAQFromHMM(BAQ.java:530)
at org.broadinstitute.sting.utils.baq.BAQ.baqRead(BAQ.java:663)
at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.calculateBAQArray(BaseRecalibrator.java:428)
at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:243)
at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:112)
at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:203)
at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:191)
at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler$MapReduceJob.run(NanoScheduler.java:468)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)

Thanks for reporting this. The problem seems to be that you either have: 1) a mixture of well-encoded and mis-encoded reads in your file, or 2) base qualities that are extremely poorly calibrated and that span too large a range. I will add a patch (that will be available in version 2.4) that exits more gracefully with a better error message, but it's not going to help you unfortunately. You need to go back and fix this at the source because there's just something wrong with your data. Good luck and sorry to be the bearer of bad news.

I figure out the problem, and maybe other users will be interested to know. The problem is that I mixed two versions of GATK for the analyses of this data set. I used GATK 2.1 to do local alignment and GATK 2.3 (when it's available) to do base quality recalibaration. When I re-do the anaylses all with GATK 2.3, the problem is solved.
Best,

On the other hand I'm getting a different index out of range, not sure if that gives you any info:

java.lang.ArrayIndexOutOfBoundsException: -6
at org.broadinstitute.sting.utils.baq.BAQ.calcEpsilon(BAQ.java:158)
at org.broadinstitute.sting.utils.baq.BAQ.hmm_glocal(BAQ.java:225)
at org.broadinstitute.sting.utils.baq.BAQ.calcBAQFromHMM(BAQ.java:542)
at org.broadinstitute.sting.utils.baq.BAQ.calcBAQFromHMM(BAQ.java:595)
at org.broadinstitute.sting.utils.baq.BAQ.calcBAQFromHMM(BAQ.java:530)
at org.broadinstitute.sting.utils.baq.BAQ.baqRead(BAQ.java:663)

I have observed in the 1000 genomes supplementary information that GATK was only employed to detect variants on Illumina data. Is it just a coincidence or did you have any issues with g1k SOLiD data?

Exception in thread "main" net.sf.picard.PicardException: Value was put into PairInfoMap more than once. -1: SRR097794.39113769

This is because the aligner may report more than one possible mapping position for each read. But, is this incorrect? This data comes from the 1000 Genomes Project and to discard that my manipulations could add any error I just reproduced the error again with the raw data. Is there any other malformation?

This is the command now, I had to add --fix-misencoded-quality-scores:

java.lang.ArrayIndexOutOfBoundsException: -8
at org.broadinstitute.sting.utils.baq.BAQ.calcEpsilon(BAQ.java:158)
at org.broadinstitute.sting.utils.baq.BAQ.hmm_glocal(BAQ.java:225)
at org.broadinstitute.sting.utils.baq.BAQ.calcBAQFromHMM(BAQ.java:542)
at org.broadinstitute.sting.utils.baq.BAQ.calcBAQFromHMM(BAQ.java:595)
at org.broadinstitute.sting.utils.baq.BAQ.calcBAQFromHMM(BAQ.java:530)
at org.broadinstitute.sting.utils.baq.BAQ.baqRead(BAQ.java:663)

The data comes from the 1000 Genomes Project repository and I just selected the chromosome 22 to deal with a smaller file.

You should go back to the 1000 Genomes Project then and make sure you are pulling down the correct file, because all of the base qualities were mis-encoded in the file you uploaded to us. The minimum value is ASCII33 but you had values that were lower than that. At this point, the problem is not with the GATK so there's really nothing else we can do to help here. Good luck!

Yes, sorry for the malformed file, I saw the mis-encoded base call qualities but I did not want to confuse the main point, now I see it is related. In fact the base call qualities were mis-encoded by the GATK's option "--fix_misencoded_quality_scores" wrongly called by me. And this may be causing the error shown above.

But why I called it? Because when running without this parameter I got the following message:

##### ERROR MESSAGE: SAM/BAM file
SAMFileReader{/home/priesgo/data/sequences/1000G_releases/20110521/NA12814/exome_alignment/NA12814.22.bam} appears to be using the wrong encoding for quality scores: we encountered an extremely high quality score of 63; please see the GATK --help documentation for options related to this error

And I found this as a possible solution by @ymv in this entry, but it does not seem to apply for my case

So, let's see this base call qualities encoded string:

""3'"IUC34;U\FMI5I\]]_`L<FYZY_^_`\@

The "`" corresponds to 63 in Phred scale and translating some of these characters gives us:

1 1 18 6 1 40 52 ... 60 60 62 63 43 ... 62 63 59 31

We can see that the values are correctly distributed in the range from 1 up to 63. Let me ask, is there a way to compress this base call quality range in GATK?

Sorry for the long post and thanks again. By the way this might be better in another post...