You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I realized an issue, when feeding clar3 gvcf files into GenotypeGVCFs of GATK for cohort calling. The SNP where it became obvious is a SNP likely located on one copy of a large duplication (shown by increased coverage of the region and inconsistent haplotypes), and thus having a biased allele ratio, but clearly having >8% T, as the cutoff would be:
image
The gVCF record of clair3 for this SNP looks as follows:
10 23761033 . G <NON_REF> 0 . END=23761033 GT:GQ:MIN_DP:PL ./.:0:68:245,0,1385
Clair3 assigns a GQ of zero to that SNP, as I would also have. However, the likelihood for a heterozygote is still higher than for one of the homozygous states. Now, as the FORMAT field is handled as for a variant without possible alternative alleles, information is lost for GATK. A PASS variant for comparison here:
10 23761040 . T A,<NON_REF> 19.22 PASS F GT:GQ:DP:AD:AF:PL 0/1:19:68:27,38,0:0.5588:28,0,45,990,990,990
GATK then strangely does following. It does not respect the GQ=0 from clair3, but assigns 99, probably as the variant was discovered in other samples as well. Additionally, it does not have all information on the AD, thus resulting in a 1/1 call, without coverage for the alternative allele..
10 23761033 . G T 1093.24 . [...] GT:AD:AF:DP:GQ:PL 0/1:68,0:.:68:99:245,0,1385 [...]
Guess, solving the problem needs input from both tools. Clair3 and GATK, if I am correct. If I understood the issue correctly, clair3 should output AD information for more sites. Further, GATK should respect the GQ information of clair3 in this case.
GenotypeGVCFs tool does not warrant any compatibilities for input files generated other than GATK itself. Though you may modify your GVCFs to make them compatible with GenotypeGVCFs your best bet would be to use another tool such as glnexus to joint genotype your GVCF files. Clair3 mentions a configuration file that works with glnexus so you may be better of checking that route.
I realized an issue, when feeding clar3 gvcf files into GenotypeGVCFs of GATK for cohort calling. The SNP where it became obvious is a SNP likely located on one copy of a large duplication (shown by increased coverage of the region and inconsistent haplotypes), and thus having a biased allele ratio, but clearly having >8% T, as the cutoff would be:
image
The gVCF record of clair3 for this SNP looks as follows:
Clair3 assigns a GQ of zero to that SNP, as I would also have. However, the likelihood for a heterozygote is still higher than for one of the homozygous states. Now, as the FORMAT field is handled as for a variant without possible alternative alleles, information is lost for GATK. A PASS variant for comparison here:
GATK then strangely does following. It does not respect the GQ=0 from clair3, but assigns 99, probably as the variant was discovered in other samples as well. Additionally, it does not have all information on the AD, thus resulting in a 1/1 call, without coverage for the alternative allele..
Guess, solving the problem needs input from both tools. Clair3 and GATK, if I am correct. If I understood the issue correctly, clair3 should output AD information for more sites. Further, GATK should respect the GQ information of clair3 in this case.
I posted the issue at the clair3 page as well: HKU-BAL/Clair3#354.
Thanks,
Johannes
The text was updated successfully, but these errors were encountered: