You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed the dragen_sv_merge.py script is effective for merging STR, SV, and CNV files at the sample level. I am currently exploring options to merge SVs across a cohort to generate a project-level VCF file.
Your preprint article (https://doi.org/10.1101/2024.01.02.573821) outlines the use of bcftools and truvari for merging SVs between samples. However, I am concerned about the substantial presence of ./. genotypes and the resulting high missing rates. Are there additional genotyping steps in your methodology that could mitigate these issues?
Thank you for your insights and time.
Best regards,
Jixin
The text was updated successfully, but these errors were encountered:
Hi @jxcao98, sorry for the late response. We used Truvari to merge the STR, SV, and CNVs within a sample and then used bcftools merge to merge VCFs among all samples. Yes, merging SVs can always be tricky. For the analysis in our study, we did not use any additional genotyping steps.
Thanks for your response. I am working on constructing a population-level SV call set for approximately 500K participants in the UK Biobank. I have successfully merged individual SVs from all participants into a single VCF file using the command bcftools merge -l ./filelist.txt -m none -0. My next goal is to collapse any potentially redundant SVs within the cohort. I am aware that Truvari can perform this task, but face challenges in implementing truvari collapse on such a large dataset. Specifically, I have been unable to successfully run it even on the smallest dataset of DUP in chromosome 21, despite minimizing the VCF representation as much as possible, possibly due to its substantial memory requirements.
Additionally, I am wondering whether the bcftools merge -0 parameter is necessary, as without it, we end up with a significant number of ./. genotypes in the merged VCF. This situation may necessitate additional re-genotyping, which can be quite costly.
Lastly, I am seeking to understand the best practices for using bcftools merge and truvari collapse within DRAGEN workflows, particularly in terms of how these tools are utilized for merging samples across populations and minimizing redundancy.
Hi,
I noticed the
dragen_sv_merge.py
script is effective for merging STR, SV, and CNV files at the sample level. I am currently exploring options to merge SVs across a cohort to generate a project-level VCF file.Your preprint article (https://doi.org/10.1101/2024.01.02.573821) outlines the use of
bcftools
andtruvari
for merging SVs between samples. However, I am concerned about the substantial presence of./.
genotypes and the resulting high missing rates. Are there additional genotyping steps in your methodology that could mitigate these issues?Thank you for your insights and time.
Best regards,
Jixin
The text was updated successfully, but these errors were encountered: