Get notifications!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Got a problem?

1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Jump to another community

HaplotypeCaller/ Variantannotator no allele balance tag for all SNPs

I couldnt find AlleleBalance and AlleleBalanceBySample tags in my vcf outputs. Tags are not found even for single variant
I tried HaplotypeCaller with -all or directly with -A AlleleBalance or -A AlleleBalanceBySample.
Also I tried Variantannotator with -all or -A AlleleBalance or -A AlleleBalanceBySample.

I should have been little more careful about this.
I called my variant using HTcaller which takes about 3 days per sample.
And, now I need the AB values for each samples - is it possible to calculate and add the AB fields in the VCF file post variant calling?
I found a python script online https://gist.github.com/mjclark/1057839 for the job, but its failing which I think has to do how vcf are handled in GATK vs. this program.

I was trying to get the AlleleBalanceBySample information in my vcf, but for some reason I can not find the option to create that with VariantAnnotator.
I have a vcf file with all my samples and I have AD for each sample. However, I would like to have a field, per sample, indicating the proportion of each allele (in case of heterozygosity), to be able to filter variants that are false positives (per sample).

What I was trying to say is that in previous documentation you refer that we can annotate VCF with AlleleBalanceBySample information using VariantAnnotator. However I don´t find this option when running VariantAnnotator tool.

By the way I also don´t understand how can I give my BAM file, if I have a BAM file for each sample and my VCF in an output from GenotypeGVCFs tool after calling with HaplotypeCaller. Should I separate again by sample?

Now I was able to do this step with this command:
java -jar /GenomeAnalysisTK.jar -T VariantAnnotator -R hg19.fa -I BAM.list -V ALL_samples_onlySNP_filter.vcf --dbsnp dbsnp_138.hg19.vcf -alwaysAppendDbsnpId -A AlleleBalanceBySample -o ALL_samples_onlySNP_annotated.vcf

I am sorry I did not understand the step of AlleleBalance at first, because for my INDEL vcf did not work, but it worked with SNPs.

Why does this function does not work with INDELS? My INDEL variants do not have 0.5 of heterozygoty for most of subjects (normally it is 0.2 of heterozygoty. Does this mean that is a false variant? Or do you recommend different threshold of minimum allele frequency for INDELS?

@Shelia
Hi Sheila,
I saw in the documentation that AlleleBalance or AlleleBalanceBySample can't be calculated for indels and searched the forums to see if that is still true (in case the documentation is out of date) and found this thread. Is that still the case? If so, do you know the reason why? It doesn't seem to me that the formula contains anything that is not available for indels, unless I'm missing something. With some exome capture kits that have low median coverage, we have a high rate of indels with low allele balance, so it would be very helpful to have this annotation specifically for indels in order to use for filtering. Even if GATK maintains the disclaimer that it is experimental and results should be interpreted with caution, it would be nice if we could "experiment" with it.
Thanks,
Andrew

Could you figure out what you need using the related DepthPerAlleleBySample annotation? I suspect the reason this annotation isn't available for indels is that representation introduces ambiguity to the interpretation.

Hi @shlee,
Thanks for the reply. AlleleBalanceBySample could be helpful for manual correction of genotypes for individual samples and I could potentially use DepthPerAlleleBySample for this (by computing AlleleBalanceBySample annotations on the fly). However it seems that implementing the AlleleBalance annotation in VQSR would be much more powerful since this appears to be a systemic problem with this dataset, and I would prefer that over doing anything manual. One of the things I wanted to experiment with is adding ABHet to the suggested variant-level annotations so GATK can also learn what is a good allele balance from the positive variants, so we could get rid of the others. I can't think of what is ambiguous about the interpretation of AlleleBalance for indels compared to SNPs -- can you explain?
An alternative might be using allele count (AC) annotation, but it seems that with the variable depth in exome sequencing that might not work -- it only makes sense as a fraction of the depth. If you have any other ideas I'm open to them. (Sorry, just realizing I maybe should have opened a new thread.)

@andrewo As far as I can remember it's just that the indel case was not implemented at the time, and this annotation hasn't been developed any further due to lack of interest on our side. To be frank it hasn't even been ported to GATK4. It might still get ported in the future (or we would accept a pull-request if someone wants to take a stab at it) but considering it hasn't bubbled up as being worthwhile for us in the past, I wouldn't recommend holding your breath. In terms of filtering, we've put a lot of our eggs in the deep learning basket, and we have a prototype tool that is intended to replace VQSR that is performing much better, especially on indels. I realize that doesn't help you solve your right-now problem of course... Unfortunately we have limited resources, considering all the work that needs to be done, and so we have to prioritize our efforts quite brutally.