You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I realized an issue, when feeding clar3 gvcf files into GenotypeGVCFs of GATK for cohort calling. The SNP where it became obvious is a SNP likely located on one copy of a large duplication (shown by increased coverage of the region and inconsistent haplotypes), and thus having a biased allele ratio, but clearly having >8% T, as the cutoff would be:
The gVCF record of clair3 for this SNP looks as follows:
10 23761033 . G <NON_REF> 0 . END=23761033 GT:GQ:MIN_DP:PL ./.:0:68:245,0,1385
Clair3 assigns a GQ of zero to that SNP, as I would also have. However, the likelihood for a heterozygote is still higher than for one of the homozygous states. Now, as the FORMAT field is handled as for a variant without possible alternative alleles, information is lost for GATK. A PASS variant for comparison here:
10 23761040 . T A,<NON_REF> 19.22 PASS F GT:GQ:DP:AD:AF:PL 0/1:19:68:27,38,0:0.5588:28,0,45,990,990,990
GATK then strangely does following. It does not respect the GQ=0 from clair3, but assigns 99, probably as the variant was discovered in other samples as well. Additionally, it does not have all information on the AD, thus resulting in a 1/1 call, without coverage for the alternative allele..
10 23761033 . G T 1093.24 . [...] GT:AD:AF:DP:GQ:PL 0/1:68,0:.:68:99:245,0,1385 [...]
Guess, solving the problem needs input from both tools. Clair3 and GATK, if I am correct. If I understood the issue correctly, clair3 should output AD information for more sites. Further, GATK should respect the GQ information of clair3 in this case.
I will post the issue at the GATK repro as well and crosslink the issues (broadinstitute/gatk#9068).
Thanks,
Johannes
The text was updated successfully, but these errors were encountered:
I realized an issue, when feeding clar3 gvcf files into GenotypeGVCFs of GATK for cohort calling. The SNP where it became obvious is a SNP likely located on one copy of a large duplication (shown by increased coverage of the region and inconsistent haplotypes), and thus having a biased allele ratio, but clearly having >8% T, as the cutoff would be:
The gVCF record of clair3 for this SNP looks as follows:
Clair3 assigns a GQ of zero to that SNP, as I would also have. However, the likelihood for a heterozygote is still higher than for one of the homozygous states. Now, as the FORMAT field is handled as for a variant without possible alternative alleles, information is lost for GATK. A PASS variant for comparison here:
GATK then strangely does following. It does not respect the GQ=0 from clair3, but assigns 99, probably as the variant was discovered in other samples as well. Additionally, it does not have all information on the AD, thus resulting in a 1/1 call, without coverage for the alternative allele..
Guess, solving the problem needs input from both tools. Clair3 and GATK, if I am correct. If I understood the issue correctly, clair3 should output AD information for more sites. Further, GATK should respect the GQ information of clair3 in this case.
I will post the issue at the GATK repro as well and crosslink the issues (broadinstitute/gatk#9068).
Thanks,
Johannes
The text was updated successfully, but these errors were encountered: