I'm having the following problem: I am working with different sequencing datasets (Ion Torrent) resulting from targeted sequencing. As it is not the full exome, I have to use hard filtration to filter those SNPs and indels (detected by the Haplotype Caller) that seem to be reliable. Regarding the SNPs the results are quite satisfying. However, regarding the indels, I have a relatively long list of possible candidates. Unfortunately I don't know the biological truth, so it is very hard to adjust certain parameters to finally exclude those indels, which are false positives.

Therefore, I was thinking of a dataset, resulting from sequencing with Ion Torrent, and a list with validated indels in this dataset. With this dataset it would be possible to optimize my pipeline for detecting the true indels, and not all those false positives.

As you propose certain thresholds for the hard filtration of indels, I thought you might have some datasets with validated indels, on which your proposals rely?! If so, is there a place where I can find those datasets?

I am trying to merge two vcfs (SNVs and INDELs) from the same sample. The problem appears to be that the INDEL vcf defines "combined_sample_name" but the SNV vcf does not. So when I merge I get two sample columns. How can I force GATK to treat them as a single sample?

I tried --assumeIdenticalSamples to do a "simple merge," but that made no difference.