Skip to content

Commit

Permalink
Merge pull request #23 from nf-core/dev
Browse files Browse the repository at this point in the history
Last changes to release 1.0.0
  • Loading branch information
carpanz authored Oct 24, 2022
2 parents fa9b7c1 + 6e32399 commit 4cd916f
Show file tree
Hide file tree
Showing 5 changed files with 23 additions and 13 deletions.
8 changes: 8 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,18 @@

> Okonechnikov, K., Conesa, A., & García-Alcalde, F. (2015). “Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data.” Bioinformatics, btv566
- [BAMTOOLS](https://github.com/pezmaster31/bamtools)

> Barnett DW, Garrison EK, Quinlan AR, Strömberg MP, Marth GT. BamTools: a C++ API and toolkit for analyzing and managing BAM files. Bioinformatics. 2011 Jun 15;27(12):1691-2. doi: 10.1093/bioinformatics/btr174. Epub 2011 Apr 14. PubMed PMID: 21493652; PubMed Central PMCID: PMC3106182.
- [BWA](http://bio-bwa.sourceforge.net)

> Li, H. (2013). Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint arXiv:1303.3997.
- [BWAmem2](https://github.com/bwa-mem2/bwa-mem2)

> Vasimuddin Md, Sanchit Misra, Heng Li, Srinivas Aluru. Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems. IEEE Parallel and Distributed Processing Symposium (IPDPS), 2019.
- [SAMtools](https://www.htslib.org)

> Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R; 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009 Aug 15;25(16):2078-9. doi: 10.1093/bioinformatics/btp352. Epub 2009 Jun 8. PMID: 19505943; PMCID: PMC2723002.
Expand Down
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,9 @@

## Introduction

**nf-core/hgtseq** is a bioinformatics best-practice analysis pipeline built to investigate horizontal gene transfer from NGS data. hgtseq inputs can be either fastq files, which are then mapped to the proper host reference with BWA, or mapped BAM files. Unmapped reads are then extracted with SAMtools view based on their SAM flag classification: two separate files are generated, depending on whether both mates are unmapped, or just one is. Taxonomic classification is then performed with Kraken2. A rich set of visualisations completes the
pipeline, accompanying the results with interactive Krona plots as well as Circos-like plots generated with R, aimed at better annotating potential integration sites in the host genome
**nf-core/hgtseq** is a bioinformatics best-practice analysis pipeline built to investigate horizontal gene transfer from NGS data.

The pipeline uses metagenomic classification of paired-read alignments against a reference genome to identify the presence of non-host microbial sequences within read pairs, and to infer potential integration sites into the host genome.

The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl2.html) implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from [nf-core/modules](https://github.com/nf-core/modules) in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community!

Expand Down
2 changes: 1 addition & 1 deletion assets/samplesheet_fastq.csv
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
sample,input1,input2
SAMPLE_PAIRED_END,/path/to/fastq/files/AEG588A1_S1_L002_R1_001_1.fastq.gz,/path/to/fastq/files/AEG588A1_S1_L002_R2_001_2.fastq.gz
SAMPLE_SINGLE_END,/path/to/fastq/files/AEG588A4_S4_L003_R1_001.fastq,

14 changes: 7 additions & 7 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ The second category, i.e. unmapped reads whose mate is mapped, provide the oppor

## Input Formats

The pipeline will auto-detect whether a sample is single- or paired-end using the information provided in the input file. This file can have at least two or three columns according to the format of reads used, i.e. two columns for BAM files and three for FASTQ files (as defined in the tables below).
The input file can have at least two or three columns according to the format of reads used, i.e. two columns for BAM files and three for FASTQ files (as defined in the tables below).

### FASTQ

Expand Down Expand Up @@ -101,21 +101,21 @@ This version number will be logged in reports when you run the pipeline, so that
Please note that, in addition to the classic parameters such as `--input` and `--outdir`, the pipeline requires other specific parameters.

### `--genome`
### --genome

The user must specify the genome of interest. A list of genomes is available in the pipeline under the folder conf/igenomes.config, that contains illumina iGenomes reference file paths. This follows [nf-core guidelines](https://nf-co.re/usage/reference_genomes) for reference management, and sets all necessary parameters (like fasta, gtf, bwa). The user is recommended to primarily use the _genome_ parameter, and can follow instructions at [this](https://nf-co.re/usage/reference_genomes#adding-paths-to-a-config-file) page to add genomes not currently included in the repository. All parameters set automatically as a consequence, though hidden, can be accessed by the user at command line should they wish a finer control.

### `--taxonomy_id`
### --taxonomy_id

Since the code in the report is executed differently based on the taxonomy id of the analyzed species, the user must enter it in the command line (must be taken from the Taxonomy Database of NCBI).

### `--krakendb`
### --krakendb

User must provide a Kraken2 database in order to perform the classification.
User must provide a Kraken2 database in order to perform the classification. Can optionally be in a `.tar.gz` archive.

### `--kronadb`
### --kronadb

User must also provide a Krona database in order to generate interactive pie charts with Kronatools.
User must also provide a Krona database in order to generate interactive pie charts with Kronatools. Can optionally be in a `.tar.gz` archive.

## Core Nextflow arguments

Expand Down
7 changes: 4 additions & 3 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,8 @@
"aligner": {
"type": "string",
"default": "bwa-mem",
"description": "Choose if aligner should be bwa-mem or bwa-mem2"
"description": "Choose if aligner should be bwa-mem or bwa-mem2",
"enum": ["bwa-mem", "bwa-mem2"]
},
"multiqc_runkraken": {
"type": "boolean",
Expand Down Expand Up @@ -114,12 +115,12 @@
"krakendb": {
"type": "string",
"default": "None",
"description": "Either a local path or a URL to compressed kraken database folder"
"description": "A local path to kraken database folder or compressed database file, or a URL to a compressed database file, in tar.gz format"
},
"kronadb": {
"type": "string",
"default": "None",
"description": "Either a local path or a URL to compressed .tab krona taxonomy file"
"description": "A local path or a URL to a .tab krona taxonomy file; it can also receive a compressed .tab file in tar.gz format"
},
"gff": {
"type": "string",
Expand Down

0 comments on commit 4cd916f

Please sign in to comment.