I have only dbSNP file as training set and i have set the options, known=true,training=false,truth=false,prior=6.0 in the command line as per the documentation. But that doesn't work and instead suggested to use known=false,training=true,truth=true,prior=6.0. What is the prior =6.0 here? is there any threshold for prior?

2.The above command produces empty tranches and recal file.

3.Even though the files are empty i have proceeded to ApplyRecalibration with the below command:

ERROR MESSAGE: Invalid command line: No tribble type was provided on the command line and the type of the file could not be determined dynamically. Please add an explicit type tag :NAME listing the correct type from among the supported types:

ERROR

Answers

You have to specify at least one training set containing truth variants for VQSR to work. The prior is the prior likelihood that you assign to variants in the truth set. It represents the probability that a variant in that set is indeed true and not an artifact. The value depends mainly on how confident you are about the quality of the call set. See more discussion on this here.

What was the console output? Did you get any warnings or error message?

If the files are empty there is no point in running the next step, it will not work.

Thanks. there seems to be error with -an Inbreed annotation. I have removed this and it works now. I have added the option -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 to the VariantRecalibrator along with the above command, followed by ApplyRecalibration. Now i have the recalibrated scores. Could you let me know how to interpret VQSLOD scores and the PASS or fail filter?

Does it mean the higher the score, the variant is more reliable? or the other way?

A most frequent question about the filtering parameters is, what are the ideal thresholds for filtering such as QUAL (quality of the SNP), Mapping quality(MQ) and the most frequent answer is, it depends on the dataset

QUAL and MQ are the phred-scaled probability scores for the variant. Can we use QUAL > 40 and MQ>40 to get a good set of filtered variants irrespective of the dataset?

Unfortunately there is no absolute rule that will yield a good set of filtered variants irrespective of the dataset. Part of the problem is how do you qualify a good set? Is it a very sensitive set, or very specific set? If you use very high quality filters, you will probably get a very specific set, but you will miss out variants that are real despite having low scores. If you lower the filter thresholds to retrieve those variants, you also let in false positives.

That is the point of VQSR, to be able to identify patterns of covariation that are more informative than simply filtering on quality scores, and to fine-tune the filtering to achieve your desired compromise between sensitivity and specificity. But it is not perfect, and it is not possible to use with every dataset. In any case, you need to experiment with the settings to find what works for you.