Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem in gvcf interplay between clair3 and gatk #354

Open
johannesgeibel opened this issue Jan 6, 2025 · 1 comment
Open

Problem in gvcf interplay between clair3 and gatk #354

johannesgeibel opened this issue Jan 6, 2025 · 1 comment
Assignees

Comments

@johannesgeibel
Copy link

johannesgeibel commented Jan 6, 2025

I realized an issue, when feeding clar3 gvcf files into GenotypeGVCFs of GATK for cohort calling. The SNP where it became obvious is a SNP likely located on one copy of a large duplication (shown by increased coverage of the region and inconsistent haplotypes), and thus having a biased allele ratio, but clearly having >8% T, as the cutoff would be:

image

The gVCF record of clair3 for this SNP looks as follows:

10      23761033        .       G       <NON_REF>       0       .       END=23761033    GT:GQ:MIN_DP:PL ./.:0:68:245,0,1385

Clair3 assigns a GQ of zero to that SNP, as I would also have. However, the likelihood for a heterozygote is still higher than for one of the homozygous states. Now, as the FORMAT field is handled as for a variant without possible alternative alleles, information is lost for GATK. A PASS variant for comparison here:

10      23761040        .       T       A,<NON_REF>     19.22   PASS    F       GT:GQ:DP:AD:AF:PL       0/1:19:68:27,38,0:0.5588:28,0,45,990,990,990

GATK then strangely does following. It does not respect the GQ=0 from clair3, but assigns 99, probably as the variant was discovered in other samples as well. Additionally, it does not have all information on the AD, thus resulting in a 1/1 call, without coverage for the alternative allele..

10      23761033        .       G       T       1093.24 .       [...]   GT:AD:AF:DP:GQ:PL 0/1:68,0:.:68:99:245,0,1385 [...]

Guess, solving the problem needs input from both tools. Clair3 and GATK, if I am correct. If I understood the issue correctly, clair3 should output AD information for more sites. Further, GATK should respect the GQ information of clair3 in this case.

I will post the issue at the GATK repro as well and crosslink the issues (broadinstitute/gatk#9068).

Thanks,
Johannes

@aquaskyline
Copy link
Member

Would you consider GLNexus as an alternative for joint calling? A sample configuration for GLNexus is shown here https://github.com/HKU-BAL/Clair3?tab=readme-ov-file#vcfgvcf-output-formats

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants