Get notifications!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Got a problem?

1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?

Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community

To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

GenotypeGVCFs variant IDs

I am trying to use GenotypeGVCFs to perform joint genotyping on 16 samples. These 16 samples were sequenced twice on two different machines, so I actually have 32 readsets. I called variants for each using HaplotypeCaller, producing GVCFs and am now trying to combine these into a single multi-sample VCF, wherein the resultant multisample file will contain information for all variant loci across the cohort. However, since the samples have the same names, when I try to use GenotypeGVCFs, they are seemingly collapsed, so I only have 16 samples recorded in my output VCF. I tried specifying variant names in the format --variant:name input1.g.vcf with both GenotypeGVCFs and CombineGVCFs but had the same result - half the samples missing in the output. I know it is possible to do this using CombineVariants, but this will not take GVCF input. Is it possible to specify names for the variants when using GenotypeGVCFs?

Answers

You need to specify the sample names in the read groups. However, the easiest thing to do now is manually edit the sample names in the GVCFs. As long as the sample names are different in the GVCFs, GenotypeGVCFs will process them as different samples.