Get notifications!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Got a problem?

1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Jump to another community

how to split multiple samples in a single VCF file?

I finally got the filtered VCF file from PWA + PiCard + GATK pipeline, and have 11 exome-seq data files which were processed as a list of input to GATK. In the process of getting VCF, I did not see an option of separating the 11 samples. Now, I've got two VCF files (one for SNPs and the other for indels) that each has 11 samples. My question is how to proceed from here?

Should I separate the 11 files before annotation? or annotation first then split them 11 samples to individual files? Big question here is how to split the samples from vcf files? thanks

Best Answer

Perhaps the better question is why do you want to split the variants into separate files? If the samples are part of a cohort in the same study, then the usual practice is to keep them together as a multisample vcf. This makes it easier to do joint analysis, and is more efficient for storage size.

But if you do want to separate them out, you can use SelectVariants to extract the variants corresponding to each sample by name.

Answers

Perhaps the better question is why do you want to split the variants into separate files? If the samples are part of a cohort in the same study, then the usual practice is to keep them together as a multisample vcf. This makes it easier to do joint analysis, and is more efficient for storage size.

But if you do want to separate them out, you can use SelectVariants to extract the variants corresponding to each sample by name.

Thanks Generaldine for the answer. A new question though: Since I have 11 samples in the VCF, if I do not split them, how do I filter the VCF file for individual samples same time? For example, my criteria is: at least 20 depth of coverage and 10 variant allele counts. Can I still use SelectVariants with parameter "-select "DP > 20.0" " ? how do I differentiate the total DP from at least 10 variant allels?