Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to merge structural variant at population level? #3

Open
jxcao98 opened this issue Aug 12, 2024 · 2 comments
Open

How to merge structural variant at population level? #3

jxcao98 opened this issue Aug 12, 2024 · 2 comments

Comments

@jxcao98
Copy link

jxcao98 commented Aug 12, 2024

Hi,

I noticed the dragen_sv_merge.py script is effective for merging STR, SV, and CNV files at the sample level. I am currently exploring options to merge SVs across a cohort to generate a project-level VCF file.

Your preprint article (https://doi.org/10.1101/2024.01.02.573821) outlines the use of bcftools and truvari for merging SVs between samples. However, I am concerned about the substantial presence of ./. genotypes and the resulting high missing rates. Are there additional genotyping steps in your methodology that could mitigate these issues?

Thank you for your insights and time.

Best regards,
Jixin

@srbehera
Copy link
Owner

srbehera commented Dec 2, 2024

Hi @jxcao98, sorry for the late response. We used Truvari to merge the STR, SV, and CNVs within a sample and then used bcftools merge to merge VCFs among all samples. Yes, merging SVs can always be tricky. For the analysis in our study, we did not use any additional genotyping steps.

@jxcao98
Copy link
Author

jxcao98 commented Dec 3, 2024

Thanks for your response. I am working on constructing a population-level SV call set for approximately 500K participants in the UK Biobank. I have successfully merged individual SVs from all participants into a single VCF file using the command bcftools merge -l ./filelist.txt -m none -0. My next goal is to collapse any potentially redundant SVs within the cohort. I am aware that Truvari can perform this task, but face challenges in implementing truvari collapse on such a large dataset. Specifically, I have been unable to successfully run it even on the smallest dataset of DUP in chromosome 21, despite minimizing the VCF representation as much as possible, possibly due to its substantial memory requirements.

Additionally, I am wondering whether the bcftools merge -0 parameter is necessary, as without it, we end up with a significant number of ./. genotypes in the merged VCF. This situation may necessitate additional re-genotyping, which can be quite costly.

Lastly, I am seeking to understand the best practices for using bcftools merge and truvari collapse within DRAGEN workflows, particularly in terms of how these tools are utilized for merging samples across populations and minimizing redundancy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants