Get notifications!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Got a problem?

1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

ERROR A GATK RUNTIME ERROR has occurred (version 2.4-9-g532efad):

I have performed aligning of the whole genome ~ 40X coverage data with bwa 0.7.3a using bwa-mem option. Followed by Indel-realignment, fixing mate-pairs, and removing duplicates withe GATK 2.4-9-g532efad and picard tools (1.86). Outputs at all the steps look fine. However, I run into problem at the base recalibration step, which runs for about two hours on cluster with -nct 8 option before failing at the following position
INFO 16:38:42,214 ProgressMeter - 3:172666449 2.61e+08 2.0 h 27.0 s 21.4% 9.4 h 7.4 h
INFO 16:39:12,375 ProgressMeter - 3:175261930 2.63e+08 2.0 h 27.0 s 21.5% 9.4 h 7.4 h
INFO 16:39:21,706 GATKRunReport - Uploaded run statistics report to AWS S3

I did perform the validation on the bam file produced at removing duplicates step and it is fine. It has 1.25 billion reads.
There was a suggestion from Geraldine in response to earlier error report to consider using "-rf MateSameStrand" When I use the filter, base recalibrator gets done in 1.5 hours with -nct 8 option without error condition. I am printing about last dozen lines from output

With "-rf MateSameStrand" filter, base recalibrator has processed only 11.6 M reads, what happened to other reads (I have 1.25 billion in my bam file). I do not understand what is going on at "MicroScheduler" it is showing " 34797156 reads were filtered out during traversal out of 34997160 total (99.43%)" . How did it get (34.99 - 11.6 M) reads? I was surprised with "28627579 reads (81.80% of total) failing MateSameStrandFilter". I thought, may be there is something wrong with my BWA alignment, but it looks fine. Here is a sample output from bwa

I also tried running the nightly GATK built and I keep getting the following message

ERROR MESSAGE: Timeout of 30000 milliseconds was reached while trying to acquire a lock on file /home/mgujral/broad_bundles/2.3/b37/dbsnp_137.b37.vcf.idx. Since the GATK uses non-blocking lock acquisition calls that are not supposed to wait, this implies a problem with the file locking support in your operating system.

I have following few questions

1) Does my base recalibrator output with "-rf MateSameStrand" look any way near acceptable?
2) Should I just wait for release 2.5 and try again?

Best Answer

Based on the run summaries you posted I am concerned that your mapping and data quality may not be as good as you think. The MateSameStrand issue (which is usually not a big deal) may be covering up some other issue. You also have over 17% reads failing the mapping quality zero filter, which means that a sixth of your reads are unmapped. That's a lot and it's not a good sign regarding data quality. You may consider doing some additional QC analysis to get a clear picture of your data quality.

Also, I see you used BWA mem 0.7.3 to map your data. Early versions of BWA mem were pretty buggy and caused multiple known issues. The developer of BWA recently released a new version (0.7.4) containing many bug fixes. I would recommend that you repeat the mapping with the latest version of bwa and repeat the processing with GATK 2.5 once it's out.

Answers

Based on the run summaries you posted I am concerned that your mapping and data quality may not be as good as you think. The MateSameStrand issue (which is usually not a big deal) may be covering up some other issue. You also have over 17% reads failing the mapping quality zero filter, which means that a sixth of your reads are unmapped. That's a lot and it's not a good sign regarding data quality. You may consider doing some additional QC analysis to get a clear picture of your data quality.

Also, I see you used BWA mem 0.7.3 to map your data. Early versions of BWA mem were pretty buggy and caused multiple known issues. The developer of BWA recently released a new version (0.7.4) containing many bug fixes. I would recommend that you repeat the mapping with the latest version of bwa and repeat the processing with GATK 2.5 once it's out.