How to merge structural variant at population level? #3

jxcao98 · 2024-08-12T13:27:43Z

Hi,

I noticed the dragen_sv_merge.py script is effective for merging STR, SV, and CNV files at the sample level. I am currently exploring options to merge SVs across a cohort to generate a project-level VCF file.

Your preprint article (https://doi.org/10.1101/2024.01.02.573821) outlines the use of bcftools and truvari for merging SVs between samples. However, I am concerned about the substantial presence of ./. genotypes and the resulting high missing rates. Are there additional genotyping steps in your methodology that could mitigate these issues?

Thank you for your insights and time.

Best regards,
Jixin

The text was updated successfully, but these errors were encountered:

srbehera · 2024-12-02T19:51:04Z

Hi @jxcao98, sorry for the late response. We used Truvari to merge the STR, SV, and CNVs within a sample and then used bcftools merge to merge VCFs among all samples. Yes, merging SVs can always be tricky. For the analysis in our study, we did not use any additional genotyping steps.

jxcao98 · 2024-12-03T06:13:49Z

Thanks for your response. I am working on constructing a population-level SV call set for approximately 500K participants in the UK Biobank. I have successfully merged individual SVs from all participants into a single VCF file using the command bcftools merge -l ./filelist.txt -m none -0. My next goal is to collapse any potentially redundant SVs within the cohort. I am aware that Truvari can perform this task, but face challenges in implementing truvari collapse on such a large dataset. Specifically, I have been unable to successfully run it even on the smallest dataset of DUP in chromosome 21, despite minimizing the VCF representation as much as possible, possibly due to its substantial memory requirements.

Additionally, I am wondering whether the bcftools merge -0 parameter is necessary, as without it, we end up with a significant number of ./. genotypes in the merged VCF. This situation may necessitate additional re-genotyping, which can be quite costly.

Lastly, I am seeking to understand the best practices for using bcftools merge and truvari collapse within DRAGEN workflows, particularly in terms of how these tools are utilized for merging samples across populations and minimizing redundancy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to merge structural variant at population level? #3

How to merge structural variant at population level? #3

jxcao98 commented Aug 12, 2024 •

edited

Loading

srbehera commented Dec 2, 2024

jxcao98 commented Dec 3, 2024

How to merge structural variant at population level? #3

How to merge structural variant at population level? #3

Comments

jxcao98 commented Aug 12, 2024 • edited Loading

srbehera commented Dec 2, 2024

jxcao98 commented Dec 3, 2024

jxcao98 commented Aug 12, 2024 •

edited

Loading