Skip to content

Commit

Permalink
Merge pull request nf-core#1267 from grantn5/add_bcftools_annotate
Browse files Browse the repository at this point in the history
Add bcftools annotate
  • Loading branch information
grantn5 authored Oct 18, 2023
2 parents 2d02bc0 + 33464b2 commit 2336ae3
Show file tree
Hide file tree
Showing 18 changed files with 354 additions and 6 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ A lake near the Rapaselet delta.
### Added

- [#1231](https://github.com/nf-core/sarek/pull/1231) - Back to dev
- [#1244](https://github.com/nf-core/sarek/pull/1244) - Add bcf annotate module

### Changed

Expand Down
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ Depending on the options and samples provided, the pipeline can currently perfor
- `Sentieon Haplotyper`
- `Strelka2`
- `TIDDIT`
- Variant filtering and annotation (`SnpEff`, `Ensembl VEP`)
- Variant filtering and annotation (`SnpEff`, `Ensembl VEP`, `BCFtools annotate`)
- Summarise and represent QC (`MultiQC`)

<p align="center">
Expand Down Expand Up @@ -131,6 +131,7 @@ We thank the following people for their extensive assistance in the development
- [Francesco Lescai](https://github.com/lescai)
- [Gavin Mackenzie](https://github.com/GCJMackenzie)
- [Gisela Gabernet](https://github.com/ggabernet)
- [Grant Neilson](https://github.com/grantn5)
- [gulfshores](https://github.com/gulfshores)
- [Harshil Patel](https://github.com/drpatelh)
- [James A. Fellows Yates](https://github.com/jfy133)
Expand Down
15 changes: 14 additions & 1 deletion conf/modules/annotate.config
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,19 @@ process {
}
}

// BCFTOOLS ANNOTATE
if (params.tools && params.tools.split(',').contains('bcfann')) {
withName: 'NFCORE_SAREK:SAREK:VCF_ANNOTATE_ALL:VCF_ANNOTATE_BCFTOOLS:BCFTOOLS_ANNOTATE' {
ext.args = '--output-type z'
ext.prefix = { "${vcf.baseName.minus(".vcf")}_BCF.ann" }
publishDir = [
mode: params.publish_dir_mode,
path: { "${params.outdir}/annotation/${meta.variantcaller}/${meta.id}/" },
pattern: "*{gz}"
]
}
}

// SNPEFF THEN VEP
if (params.tools && params.tools.split(',').contains('merge')) {
withName: "NFCORE_SAREK:SAREK:VCF_ANNOTATE_ALL:VCF_ANNOTATE_MERGE:ENSEMBLVEP_VEP" {
Expand All @@ -69,7 +82,7 @@ process {
}

// ALL ANNOTATION TOOLS
if (params.tools && (params.tools.split(',').contains('snpeff') || params.tools.split(',').contains('vep') || params.tools.split(',').contains('merge'))) {
if (params.tools && (params.tools.split(',').contains('snpeff') || params.tools.split(',').contains('vep') || params.tools.split(',').contains('merge') || params.tools.split(',').contains('bcfann'))) {
withName: "NFCORE_SAREK:SAREK:VCF_ANNOTATE_ALL:.*:(TABIX_BGZIPTABIX|TABIX_TABIX)" {
ext.prefix = { input.name - ".vcf" }
publishDir = [
Expand Down
3 changes: 3 additions & 0 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,9 @@ params {
vep_cache_version = 110
vep_genome = 'WBcel235'
vep_species = 'caenorhabditis_elegans'
bcftools_annotations = "https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/sarscov2/illumina/vcf/test2.vcf.gz"
bcftools_annotations_index = "https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/sarscov2/illumina/vcf/test2.vcf.gz.tbi"
bcftools_header_lines = "${projectDir}/tests/config/bcfann_test_header.txt"

// default params
split_fastq = 0 // no FASTQ splitting
Expand Down
12 changes: 12 additions & 0 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -886,6 +886,18 @@ plus any additional filed selected via the plugins: [dbNSFP](https://sites.googl

</details>

### BCFtools annotate

[BCFtools annotate](https://samtools.github.io/bcftools/bcftools.html#annotate) is used to add annotations to VCF files. The annotations are added to the INFO column of the VCF file. The annotations are added to the VCF header and the VCF header is updated with the new annotations. For further reading and documentation see the [BCFtools annotate manual](https://samtools.github.io/bcftools/bcftools.html#annotate).

<details markdown="1">
<summary>Output files for all samples</summary>

- `{sample,tumorsample_vs_normalsample}.<variantcaller>_bcf.ann.vcf.gz` and `{sample,tumorsample_vs_normalsample}.<variantcaller>_bcf.ann.vcf.gz.tbi`
- VCF with tabix index

</details>

## Quality control and reporting

### Quality control
Expand Down
8 changes: 8 additions & 0 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -1048,6 +1048,14 @@ Enable with `--vep_spliceregion`.
For more details, see [here](https://www.ensembl.org/info/docs/tools/vep/script/vep_plugins.html#spliceregion) and [here](https://www.ensembl.info/2018/10/26/cool-stuff-the-vep-can-do-splice-site-variant-annotation/)."
### BCFTOOLS Annotate
It is possible to annotate a VCF file with a custom annotation file using [BCFTOOLS Annotate](https://samtools.github.io/bcftools/bcftools.html#annotate). This can be done by setting adding bcfann to the tools list and setting the following parameters:
- annotations: path to vcf annotation file
- annotations_index: path to vcf annotation index file
- header_lines: path to header lines file
## MultiQC related issues
### Plots for SnpEff are missing
Expand Down
6 changes: 6 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,12 @@
"git_sha": "603ecbd9f45300c9788f197d2a15a005685b4220",
"installed_by": ["modules"]
},
"bcftools/annotate": {
"branch": "master",
"git_sha": "911696ea0b62df80e900ef244d7867d177971f73",
"installed_by": ["modules"],
"patch": "modules/nf-core/bcftools/annotate/bcftools-annotate.diff"
},
"bcftools/concat": {
"branch": "master",
"git_sha": "911696ea0b62df80e900ef244d7867d177971f73",
Expand Down
89 changes: 89 additions & 0 deletions modules/nf-core/bcftools/annotate/bcftools-annotate.diff

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

70 changes: 70 additions & 0 deletions modules/nf-core/bcftools/annotate/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

56 changes: 56 additions & 0 deletions modules/nf-core/bcftools/annotate/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 3 additions & 0 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,9 @@ params {
vep_out_format = "vcf"
vep_spliceai = null // spliceai plugin disabled within VEP
vep_spliceregion = null // spliceregion plugin disabled within VEP
bcftools_annotations = null // No extra annotation file
bcftools_annotations_index = null // No extra annotation file index
bcftools_header_lines = null // No header lines to be added to the VCF file

// MultiQC options
multiqc_config = null
Expand Down
14 changes: 13 additions & 1 deletion nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@
"fa_icon": "fas fa-toolbox",
"description": "Tools to use for duplicate marking, variant calling and/or for annotation.",
"help_text": "Multiple tools separated with commas.\n\n**Variant Calling:**\n\nGermline variant calling can currently be performed with the following variant callers:\n- SNPs/Indels: DeepVariant, FreeBayes, GATK HaplotypeCaller, mpileup, Sentieon Haplotyper, Strelka\n- Structural Variants: Manta, TIDDIT\n- Copy-number: CNVKit\n\nTumor-only somatic variant calling can currently be performed with the following variant callers:\n- SNPs/Indels: FreeBayes, mpileup, Mutect2, Strelka\n- Structural Variants: Manta, TIDDIT\n- Copy-number: CNVKit, ControlFREEC\n\nSomatic variant calling can currently only be performed with the following variant callers:\n- SNPs/Indels: FreeBayes, Mutect2, Strelka2\n- Structural variants: Manta, TIDDIT\n- Copy-Number: ASCAT, CNVKit, Control-FREEC\n- Microsatellite Instability: MSIsensorpro\n\n> **NB** Mutect2 for somatic variant calling cannot be combined with `--no_intervals`\n\n**Annotation:**\n \n- snpEff, VEP, merge (both consecutively).\n\n> **NB** As Sarek will use bgzip and tabix to compress and index VCF files annotated, it expects VCF files to be sorted when starting from `--step annotate`.",
"pattern": "^((ascat|cnvkit|controlfreec|deepvariant|freebayes|haplotypecaller|sentieon_dnascope|sentieon_haplotyper|manta|merge|mpileup|msisensorpro|mutect2|sentieon_dedup|snpeff|strelka|tiddit|vep)?,?)*(?<!,)$"
"pattern": "^((ascat|bcfann|cnvkit|controlfreec|deepvariant|freebayes|haplotypecaller|sentieon_dnascope|sentieon_haplotyper|manta|merge|mpileup|msisensorpro|mutect2|sentieon_dedup|snpeff|strelka|tiddit|vep)?,?)*(?<!,)$"
},
"skip_tools": {
"type": "string",
Expand Down Expand Up @@ -561,6 +561,18 @@
"help_text": "Sets the format of the output-file from VEP. Available formats: json, tab and vcf.",
"fa_icon": "fas fa-table",
"hidden": true
},
"bcftools_annotations": {
"type": "string",
"fa_icon": "fas fa-file"
},
"bcftools_annotations_index": {
"type": "string",
"fa_icon": "fas fa-file"
},
"bcftools_header_lines": {
"type": "string",
"fa_icon": "fas fa-align-center"
}
}
},
Expand Down
Loading

0 comments on commit 2336ae3

Please sign in to comment.