From d460daf71b61939f9d163c0edb64539929d8191c Mon Sep 17 00:00:00 2001 From: FranBonath Date: Wed, 22 Nov 2023 15:19:16 +0100 Subject: [PATCH 1/9] adding exercises to training --- .../contributing/nf_core_basic_training.md | 335 ++++++++++++++++-- 1 file changed, 314 insertions(+), 21 deletions(-) diff --git a/src/content/docs/contributing/nf_core_basic_training.md b/src/content/docs/contributing/nf_core_basic_training.md index 9f9e04e6b2..d458ae244d 100644 --- a/src/content/docs/contributing/nf_core_basic_training.md +++ b/src/content/docs/contributing/nf_core_basic_training.md @@ -229,8 +229,61 @@ Ideally code should be developed on feature branches (i.e. a new branch made wit When creating a new repository on GitHub, create it as an empty repository without a README or any other file. Then push the repo with the template of your new pipeline from your local clone. -( OLD: When creating a new repository on https://github.com or equivalent, don’t initialise it - leave it bare and push everything from your local clone -Develop your code on either the master or dev branches and leave TEMPLATE alone.) +:::tip{title="Exercise 1 - Getting around the git environment"} + +1. Create and switch to a new git branch called `demo`. +
+ solution 1 + + ```bash + git checkout -b demo + ``` + +
+ +2. Display all available git branches. +
+ solution 2 + + ```bash + git branch + ``` + +
+ +3. Create a directory within the new pipeline directory called `results` and add it to the `.gitignore` file. +
+ solution 3 + + ```bash + mkdir results + ``` + + ```groovy title=".gitignore" + .nextflow* + work/ + data/ + results/ + .DS_Store + testing/ + testing* + *.pyc + results/ + ``` + +
+ +4. Commit the changes you have made. +
+ solution 4 + + ```bash + git add . + git commit -m "creating results dir and adding it to gitignore" + ``` + +
+ ::: ## Run the new pipeline @@ -390,7 +443,7 @@ This file keeps track of modules installed using nf-core tools from the nf-core/ e) _.gitpod.yml_ -This file provides settings to create a Cloud development environment in your browser using Gitpod. It comes installed with the tools necessary to develop and test nf-core pipelines, modules, and subworkflows, allowing you to develop from anywhere without installing anything locally. + This file provides settings to create a Cloud development environment in your browser using Gitpod. It comes installed with the tools necessary to develop and test nf-core pipelines, modules, and subworkflows, allowing you to develop from anywhere without installing anything locally. f) _.nf-core.yml_ @@ -400,6 +453,43 @@ This file provides settings to create a Cloud development environment in your br i) _.prettierrc.yml_ +:::tip{title="Exercise 2 - Test your knowledge of the nf-core pipeline structure"} + +1. In which directory can you find the main script of the nf-core module `fastqc` +
+ solution 1 + + ``` + modules/nf-core/fastqc/ + ``` + +
+ +2. Which file contains the main workflow of your new pipeline? +
+ solution 2 + + ``` + workflows/demotest.nf + ``` + +
+ +3. `check_samplesheet.py` is a script that can be called by any module of your pipeline, where is it located? +
+ solution 3 + + ``` + bin/ + ``` + + This directory can also contain a custom scripts that you may wish to call from within a custom module. + +
+ +[MORE QUESTIONS CAN BE ADDED HERE] +::: + ## Customising the template In many of the files generated by the nf-core template, you’ll find code comments that look like this: @@ -514,6 +604,78 @@ nf-core lint [...] ``` +:::tip{title="Exercise 3 - ToDos and linting"} + +1. Add the following bullet point list to the README file, where the ToDo indicates to describe the default steps to execute the pipeline + + ```groovy title="pipeline overview" + - Indexing of a transcriptome file + - Quality control + - Quantification of transcripts + - [whatever the custom script does] + - Generation of a MultiQC report + ``` + +
+ solution 1 + + ```bash title="README.md" + [...] + + ## Introduction + + **nf-core/a** is a bioinformatics pipeline that ... + + + + + + Default steps: + - Indexing of a transcriptome file + - Quality control + - Quantification of transcripts + - [whatever the custom script does] + - Generation of a MultiQC report + + 1. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)) + 2. Present QC for raw reads ([`MultiQC`](http://multiqc.info/)) + + [...] + + ``` + +
+ +2. Lint the changes you have made +
+ solution 2 + + ```bash + nf-core lint + ``` + + You should see that we now get one less `warning` in our lint overview, since we removed one of the TODO items. + +
+ +3. Commit your changes +
+ solution 3 + + ```bash + git add . + git commit -m "adding pipeline overview to pipeline README" + ``` + +
+ + ::: + # Adding Modules to a pipeline ## Adding an existing nf-core module @@ -601,30 +763,53 @@ nf-core modules info salmon/index nf-core/tools version 2.10 - https://nf-co.re - -╭─ Module: salmon/index ───────────────────────────────────────────────────────────────────────────────────────────────────────────╮ -│ 🌐 Repository: https://github.com/nf-core/modules.git │ -│ 🔧 Tools: salmon │ -│ 📖 Description: Create index for salmon │ +╭─ Module: salmon/index ───────────────────────────────────────────────────────────────────────────────────────────────────────────╮ +│ 🌐 Repository: https://github.com/nf-core/modules.git │ +│ 🔧 Tools: salmon │ +│ 📖 Description: Create index for salmon │ ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ - ╷ ╷ - 📥 Inputs │Description │Pattern +╷ ╷ +📥 Inputs │Description │Pattern ╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┿━━━━━━━╸ - genome_fasta (file) │Fasta file of the reference genome │ +genome_fasta (file) │Fasta file of the reference genome │ ╶────────────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────┼───────╴ - transcriptome_fasta (file)│Fasta file of the reference transcriptome │ - ╵ ╵ - ╷ ╷ - 📤 Outputs │Description │ Pattern +transcriptome_fasta (file)│Fasta file of the reference transcriptome │ +╵ ╵ +╷ ╷ +📤 Outputs │Description │ Pattern ╺━━━━━━━━━━━━━━━━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┿━━━━━━━━━━━━╸ - index (directory)│Folder containing the star index files │ salmon +index (directory)│Folder containing the star index files │ salmon ╶───────────────────┼───────────────────────────────────────────────────────────────────────���──────────────────────────┼────────────╴ - versions (file) │File containing software versions │versions.yml - ╵ ╵ +versions (file) │File containing software versions │versions.yml +╵ ╵ + +💻 Installation command: nf-core modules install salmon/index - 💻 Installation command: nf-core modules install salmon/index ``` +:::tip{title="Exercise 4 - Identification of available nf-core modules"} + +1. Check which versions are available for the nf-core module `salmon/quant`. +
+ solution 1 + + ``` + nf-core modules info salmon/quant + ``` + +
+ +2. Is there any version of `salmon/quant` already installed locally? +
+ solution 2 + + ``` + nf-core modules list local + ``` + +
+ ::: + The out put from the info command will among other things give you the nf-core/tools installation command, lets see what it is doing: ```bash @@ -685,6 +870,91 @@ INFO Use the following statement to include this module: exercise to add a different module would be nice! => salmon/quant! comparison to simple nextflow pipeline from the basic Nextflow training would be nice!) +:::tip{title="Exercise 5 - Installing a remote module from nf-core"} + +1. Install the nf-core module `salmon/quant` version `?` +
+ solution 1 + + ``` + ``` + +
+ +2. Which file(s) were/are added and what does it / do they do? +
+ solution 2 + + ``` + ``` + +
+ +3. Import the installed `salmon/quant` pipeline into your main workflow. +
+ solution 3 + + ``` + ``` + +
+ +4. Call the `SALMON_QUANT` process in your workflow +
+ solution 4 + + ``` + ``` + +
+ +5. Add required parameters for `salmon/quant`to the `SALMON_QUANT` process +
+ solution 5 + + ``` + ``` + +
+ +6. Include the quantification results in the multiQC input +
+ solution 6 + + ``` + ``` + +
+ +7. Lint your pipeline +
+ solution 7 + + ``` + ``` + +
+ +8. Run the pipeline and inspect the results +
+ solution 8 + + ``` + ``` + +
+ +9. Commit the changes +
+ solution 9 + + ``` + ``` + +
+ +::: + ## Adding a remote module If there is no nf-core module available for the software you want to include, the nf-core tools package can also aid in the generation of a remote module that is specific for your pipeline. To add a remote module run the following: @@ -697,6 +967,25 @@ Open ./modules/local/demo/module.nf and start customising this to your needs whi ### Making a remote module for a custom script +:::tip{title="Exercise 6 - Adding a custom module"} +In the directory `exercise_6` you will find the custom script `print_hello.py`, which will be used for this and the next exercise. + +1. Create a local module that runs the `print_hello.py` script +2. Add the module to your main workflow +3. Run the pipeline +4. Lint the pipeline +5. Commit your changes +
+ solution 1 + + ``` + + ``` + +
+ +::: + ## Lint all modules As well as the pipeline template you can lint individual or all modules with a single command: @@ -751,6 +1040,10 @@ Here in the schema editor you can edit: - Special formats for strings, such as file-path - Additional fields for files such as mime-type -``` +:::tip{title="Exercise 7 - Using nextflow schema to add command line parameters"} -``` +1. Feed a string to your custom script from exercise 6 from the command line. Use `nf-core schema build` to add the parameter to the `nextflow.config` file. + + + +::: From bcfd0b9cd9dfd90a82bf21e29bd3b03a66635a77 Mon Sep 17 00:00:00 2001 From: Raquel Manzano Date: Wed, 29 Nov 2023 16:15:45 +0000 Subject: [PATCH 2/9] Added custom script stuff --- .../contributing/nf_core_basic_training.md | 83 +++++++++++++++++++ 1 file changed, 83 insertions(+) diff --git a/src/content/docs/contributing/nf_core_basic_training.md b/src/content/docs/contributing/nf_core_basic_training.md index 345e056dcb..a89f27dedd 100644 --- a/src/content/docs/contributing/nf_core_basic_training.md +++ b/src/content/docs/contributing/nf_core_basic_training.md @@ -160,6 +160,12 @@ nf-core lint # Adding Modules to a pipeline +A module is a single `process` built to be reusable and self-contained so it can be used within different Nextflow pipelines. They encapsulate a specific function or task, for example running a single tool such as [`FastQC`](https://github.com/nf-core/modules/blob/master/modules/nf-core/fastqc/main.nf). You can import and use modules like functions in a Nextflow subworkflow, this makes your workflow more readable and maintainable. + +In nf-core modules are also standarised, i.e. they follow certain rules to optimise reusability and compatibility between pipelines. Each module consists of two files: a `main.nf` script containing the module's code and a `meta.yml` file that provides general information about the module and defines its inputs and outputs, although this last one can be optional. You might also find other optional files such as `environment.yml`, which contains the packages to be installed with `conda` and a folder called `tests`, which contains the necessary files to perform validation on the module with [`nf-test`](https://code.askimed.com/nf-test/). + +You can find a list of nf-core modules available in the [`modules/`](https://github.com/nf-core/modules/tree/master/modules) directory of nf-core/modules along with the required documentation and tests. + ## Adding an existing nf-core module ### Identify available nf-core modules @@ -214,6 +220,83 @@ Open ./modules/local/demo/module.nf and start customising this to your needs whi ### Making a remote module for a custom script +To generate a module for a custom script you need to follow the same steps when adding a remote module. Then, you can supply the command for your script in the `script` block but your script needs to be present and *executable* in the `bin` folder of the pipeline. In the nf-core pipelines, this folder is in the main directory and you can see in [`rnaseq`](https://github.com/nf-core/rnaseq). Let's look at a example in this pipeline, for instance [`tximport.r`](https://github.com/nf-core/rnaseq/blob/master/bin/tximport.r). This is an Rscript present in the [`bin`](https://github.com/nf-core/rnaseq/tree/master/bin) of the pipeline. We can find the module that runs this script in [`modules/local/tximport`](https://github.com/nf-core/rnaseq/blob/master/modules/local/tximport/main.nf). As we can see the script is being called in the `script` block, note that `tximport.r` is being executed as if it was called from the command line and therefore needs to be *executable*. + +Let's create a simple custom script that converts a MAF file to a BED file called `maf2bed.py`: + +``` +#!/usr/bin/env python +""" +Author: Raquel Manzano - @RaqManzano +Script: Convert MAF to BED format keeping ref and alt info +""" +import argparse +import pandas as pd + + +def argparser(): + parser = argparse.ArgumentParser(description="") + parser.add_argument("-maf", "--mafin", help="MAF input file", required=True) + parser.add_argument("-bed", "--bedout", help="BED input file", required=True) + parser.add_argument( + "--extra", help="Extra columns to keep (space separated list)", nargs="+", required=False, default=[] + ) + return parser.parse_args() + + +def maf2bed(maf_file, bed_file, extra): + maf = pd.read_csv(maf_file, sep="\t", comment="#") + bed = maf[["Chromosome", "Start_Position", "End_Position"] + extra] + bed.to_csv(bed_file, sep="\t", index=False, header=False) + + +def main(): + args = argparser() + maf2bed(maf_file=args.mafin, bed_file=args.bedout, extra=args.extra) + + +if __name__ == "__main__": + main() + +``` + +Now, let's go back to our module: + + + +``` +process CUSTOM_SCRIPT { + tag "$meta.id" + label 'process_single' + + conda "anaconda::pandas=1.4.3" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/pandas:1.4.3' : + 'quay.io/biocontainers/pandas:1.4.3' }" + + input: + tuple val(meta), path(maf) + + output: + tuple val(meta), path('*.bed') , emit: bed + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: // This script is bundled with the pipeline in bin + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + + """ +maf2bed.py --mafin $maf --bedout ${prefix}.bed + """ +``` + +We are now able to use our custom script through the module + + + ## Lint all modules As well as the pipeline template you can lint individual or all modules with a single command: From 1673547d5f8433a42374fe5f881670c3632d9931 Mon Sep 17 00:00:00 2001 From: Mahesh Binzer-Panchal Date: Wed, 8 Nov 2023 10:12:30 +0000 Subject: [PATCH 3/9] Add learning objectives and key points --- .../contributing/nf_core_basic_training.md | 32 ++++++++++++++++++- 1 file changed, 31 insertions(+), 1 deletion(-) diff --git a/src/content/docs/contributing/nf_core_basic_training.md b/src/content/docs/contributing/nf_core_basic_training.md index d458ae244d..5fc4773207 100644 --- a/src/content/docs/contributing/nf_core_basic_training.md +++ b/src/content/docs/contributing/nf_core_basic_training.md @@ -5,6 +5,23 @@ subtitle: A guide to create Nextflow pipelines using nf-core tools # Introduction +## Scope + +- How do I create a pipeline using nf-core tools? +- How do I incorporate modules from nf-core modules? +- How can I use custom code in my pipeline? + +:::note +### Learning objectives + +- The learner will create a simple pipeline using the nf-core template. +- The learner will identify key files in the pipeline. +- The learner will lint their pipeline code to identify work to be done. +- The learner will incorporate modules from nf-core/modules into their pipeline. +- The learner will add custom code as a local module into their pipeline. +- The learner will build an nf-core schema to describe and validate pipeline parameters. +::: + This training course aims to demonstrate how to build an nf-core pipeline using the nf-core pipeline template and nf-core modules as well as custom, local modules. Be aware that we are not going to explain any fundamental Nextflow concepts, as such we advise anyone taking this course to have completed the [Basic Nextflow Training Workshop](https://training.nextflow.io/). ```md @@ -802,7 +819,7 @@ versions (file) │File containing software versions │versions.yml 2. Is there any version of `salmon/quant` already installed locally?
solution 2 - + ``` nf-core modules list local ``` @@ -1047,3 +1064,16 @@ Here in the schema editor you can edit:
::: + +:::note +### Key points + +- `nf-core create ` creates a pipeline from the nf-core template. +- `nf-core lint` lints the pipeline code for things that must be completed. +- `nf-core modules list local` lists modules currently installed into your pipeline. +- `nf-core modules list remote` lists modules available to install into your pipeline. +- `nf-core modules install ` installs the tool module into your pipeline. +- `nf-core modules create` creates a module locally to add custom code into your pipeline. +- `nf-core modules lint --all` lints your module code for things that must be completed. +- `nf-core schema build` opens an interface to allow you to describe your pipeline parameters and set default values, and which values are valid. +::: From 01cd56ee43ef66d01142116ea959617c3d97dd36 Mon Sep 17 00:00:00 2001 From: Mahesh Binzer-Panchal Date: Wed, 8 Nov 2023 10:38:29 +0000 Subject: [PATCH 4/9] Add details tag to gitpod section --- .../contributing/nf_core_basic_training.md | 23 +++++++++++-------- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/src/content/docs/contributing/nf_core_basic_training.md b/src/content/docs/contributing/nf_core_basic_training.md index 5fc4773207..ac18f876ee 100644 --- a/src/content/docs/contributing/nf_core_basic_training.md +++ b/src/content/docs/contributing/nf_core_basic_training.md @@ -85,16 +85,16 @@ This training can be followed either based on this documentation alone, or via a (no such video yet) -# Using gitpod +### Gitpod -For this tutorial we are going to use Gitpod, which is best for first-timers as this platform contains all the programs and data required. -Gitpod will contain a preconfigured Nextflow development environment and has the following requirements: +For this tutorial we will use Gitpod, which runs in the learners web browser. The Gitpod environment contains a preconfigured Nextflow development environment +which includes a terminal, file editor, file browser, Nextflow, and nf-core tools. To use Gitpod, you will need: - A GitHub account - Web browser (Google Chrome, Firefox) - Internet connection -Simply click the link and log in using your GitHub account to start the tutorial: +Click the link and log in using your GitHub account to start the tutorial:

@@ -102,9 +102,12 @@ Simply click the link and log in using your GitHub account to start the tutorial

-For more information about gitpod, including how to make your own gitpod environement, see the gitpod bytesize talk on youtube (link to the bytesize talk) +For more information about Gitpod, including how to make your own Gitpod environement, see the Gitpod bytesize talk on youtube (link to the bytesize talk), +check the [nf-core Gitpod documentation](gitpod/index) or [Gitpod's own documentation](https://www.gitpod.io/docs). -## Explore your Gitpod interface +
+ Expand this section for instructions to explore your Gitpod environment +#### Explore your Gitpod interface You should now see something similar to the following: @@ -130,9 +133,10 @@ Runtime: Groovy 3.0.19 on OpenJDK 64-Bit Server VM 17.0.8-internal+0-adhoc..src Encoding: UTF-8 (UTF-8) ``` -## Reopening a Gitpod session +#### Reopening a Gitpod session -You can reopen an environment from . Find your previous environment in the list, then select the ellipsis (three dots icon) and select Open. +When a Gitpod session is not used for a while, i.e., goes idle, it will timeout and close the interface. +You can reopen the environment from . Find your previous environment in the list, then select the ellipsis (three dots icon) and select Open. If you have saved the URL for your previous Gitpod environment, you can simply open it in your browser. @@ -142,7 +146,8 @@ If you have lost your environment, you can find the main scripts used in this tu ## Saving files from Gitpod to your local machine -To save any file from the explorer panel, right-click the file and select Download. +To save any file locally from the explorer panel, right-click the file and select Download. +
# Explore nf-core/tools From 5da47124949e6da18d9f93152c4eb539523474c2 Mon Sep 17 00:00:00 2001 From: Mahesh Binzer-Panchal Date: Wed, 15 Nov 2023 10:26:18 +0000 Subject: [PATCH 5/9] Formatting and section header update --- .../contributing/nf_core_basic_training.md | 43 ++++++++++--------- 1 file changed, 23 insertions(+), 20 deletions(-) diff --git a/src/content/docs/contributing/nf_core_basic_training.md b/src/content/docs/contributing/nf_core_basic_training.md index ac18f876ee..7a11c475be 100644 --- a/src/content/docs/contributing/nf_core_basic_training.md +++ b/src/content/docs/contributing/nf_core_basic_training.md @@ -3,8 +3,6 @@ title: Basic training to create an nf-core pipeline subtitle: A guide to create Nextflow pipelines using nf-core tools --- -# Introduction - ## Scope - How do I create a pipeline using nf-core tools? @@ -47,39 +45,43 @@ The course is going to build an (totally unscientific and useless) RNA seq pipel The following sections will be handled in the course: -**1. Setting up the gitpod environment for the course** +1. **Setting up the gitpod environment for the course** + + The course is using gitpod in order to avoid the time expense for downloading and installing tools and data. -The course is using gitpod in order to avoid the time expense for downloading and installing tools and data. +2. **Exploring the nf-core tools command** -**2. Exploring the nf-core tools command** + A very basic walk-through of what can be done with nf-core tools -A very basic walk-through of what can be done with nf-core tools +3. **Creating a new nf-core pipeline from the nf-core template** -**3. Creating a new nf-core pipeline from the nf-core template** +4. **Exploring the nf-core template** -**4. Exploring the nf-core template** + a) The git repository -a) The git repository + b) running the pipeline -b) running the pipeline + c) linting the pipeline -c) linting the pipeline + d) walk-through of the template files -d) walk-through of the template files +5. **Building a nf-core pipeline using the template** -**5. Building a nf-core pipeline using the template** + a) Adding a nf-core module to your pipeline -a) Adding a nf-core module to your pipeline + b) Adding a local custom module to your pipeline -b) Adding a local custom module to your pipeline + c) Working with Nextflow schema -c) Working with Nextflow schema + d) Linting your modules -d) Linting your modules +## Preparation -## Prerequisites +### Prerequisites -## Follow the training videos +- Familiarity with Nextflow syntax and configuration. + +### Follow the training videos This training can be followed either based on this documentation alone, or via a training video hosted on youtube. You can find the youtube video in the Youtube playlist below: @@ -107,6 +109,7 @@ check the [nf-core Gitpod documentation](gitpod/index) or [Gitpod's own document
Expand this section for instructions to explore your Gitpod environment + #### Explore your Gitpod interface You should now see something similar to the following: @@ -149,7 +152,7 @@ If you have lost your environment, you can find the main scripts used in this tu To save any file locally from the explorer panel, right-click the file and select Download.
-# Explore nf-core/tools +## Explore nf-core/tools The nf-core/tools package is already installed in the gitpod environment. Now you can check out which pipelines, subworkflows and modules are available via tools. To see all available commands of nf-core tools, run the following: From dd7007d86ee978dea7d2f922ba2bee150a04a3b0 Mon Sep 17 00:00:00 2001 From: Mahesh Binzer-Panchal Date: Wed, 15 Nov 2023 10:34:53 +0000 Subject: [PATCH 6/9] Adjust headers and fix remote->local module --- .../contributing/nf_core_basic_training.md | 27 ++++++++++--------- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/src/content/docs/contributing/nf_core_basic_training.md b/src/content/docs/contributing/nf_core_basic_training.md index 7a11c475be..4b14019144 100644 --- a/src/content/docs/contributing/nf_core_basic_training.md +++ b/src/content/docs/contributing/nf_core_basic_training.md @@ -147,9 +147,10 @@ Alternatively, you can start a new workspace by following the Gitpod URL: ## Explore nf-core/tools @@ -162,7 +163,7 @@ nf-core --help We will touch on most of the commands for developers later throughout this tutorial. -# Create a pipeline from template +## Create a pipeline from template To get started with your new pipeline, run the create command: @@ -310,7 +311,7 @@ When creating a new repository on GitHub, create it as an empty repository witho ::: -## Run the new pipeline +### Run the new pipeline The new pipeline should run with Nextflow, right out of the box. Let’s try: @@ -321,7 +322,7 @@ nextflow run nf-core-demotest/ -profile test,docker --outdir test_results This basic template pipeline contains already the FastQC and MultiQC modules, which do run on a selection of test data. -## Template code walk through +### Template code walk through Now let us have a look at the files that were generated within the `nf-core-demotest` directory when we created this pipeline. You can see all files and directories either on the left hand side in the Explorer, or by running the command: @@ -701,11 +702,11 @@ nf-core lint ::: -# Adding Modules to a pipeline +## Adding Modules to a pipeline -## Adding an existing nf-core module +### Adding an existing nf-core module -### Identify available nf-core modules +#### Identify available nf-core modules The nf-core pipeline template comes with a few nf-core/modules pre-installed. You can list these with the command below: @@ -771,7 +772,7 @@ You can list all of the modules available on nf-core/modules via the command bel nf-core modules list remote ``` -### Install a remote nf-core module +#### Install a remote nf-core module To install a remote nf-core module from the website, you can first get information about a tool, including the installation command by executing: @@ -980,9 +981,9 @@ comparison to simple nextflow pipeline from the basic Nextflow training would be ::: -## Adding a remote module +### Adding a local module -If there is no nf-core module available for the software you want to include, the nf-core tools package can also aid in the generation of a remote module that is specific for your pipeline. To add a remote module run the following: +If there is no nf-core module available for the software you want to include, the nf-core tools package can also aid in the generation of a local module that is specific for your pipeline. To add a local module run the following: ``` nf-core modules create @@ -990,7 +991,7 @@ nf-core modules create Open ./modules/local/demo/module.nf and start customising this to your needs whilst working your way through the extensive TODO comments! -### Making a remote module for a custom script +### Making a local module for a custom script :::tip{title="Exercise 6 - Adding a custom module"} In the directory `exercise_6` you will find the custom script `print_hello.py`, which will be used for this and the next exercise. @@ -1019,7 +1020,7 @@ As well as the pipeline template you can lint individual or all modules with a s nf-core modules lint --all ``` -# Nextflow Schema +## Nextflow Schema All nf-core pipelines can be run with --help to see usage instructions. We can try this with the demo pipeline that we just created: @@ -1028,7 +1029,7 @@ cd ../ nextflow run nf-core-demo/ --help ``` -## Working with Nextflow schema +### Working with Nextflow schema If you peek inside the nextflow_schema.json file you will see that it is quite an intimidating thing. The file is large and complex, and very easy to break if edited manually. From 47c893b7a225542ef47aef324ebd1cabbe7014aa Mon Sep 17 00:00:00 2001 From: Mahesh Binzer-Panchal Date: Tue, 5 Dec 2023 13:55:42 +0000 Subject: [PATCH 7/9] Fix formatting and rebase --- .../contributing/nf_core_basic_training.md | 30 +++++++++++-------- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/src/content/docs/contributing/nf_core_basic_training.md b/src/content/docs/contributing/nf_core_basic_training.md index 4b14019144..a6145a9a09 100644 --- a/src/content/docs/contributing/nf_core_basic_training.md +++ b/src/content/docs/contributing/nf_core_basic_training.md @@ -10,6 +10,7 @@ subtitle: A guide to create Nextflow pipelines using nf-core tools - How can I use custom code in my pipeline? :::note + ### Learning objectives - The learner will create a simple pipeline using the nf-core template. @@ -18,6 +19,7 @@ subtitle: A guide to create Nextflow pipelines using nf-core tools - The learner will incorporate modules from nf-core/modules into their pipeline. - The learner will add custom code as a local module into their pipeline. - The learner will build an nf-core schema to describe and validate pipeline parameters. + ::: This training course aims to demonstrate how to build an nf-core pipeline using the nf-core pipeline template and nf-core modules as well as custom, local modules. Be aware that we are not going to explain any fundamental Nextflow concepts, as such we advise anyone taking this course to have completed the [Basic Nextflow Training Workshop](https://training.nextflow.io/). @@ -47,33 +49,33 @@ The following sections will be handled in the course: 1. **Setting up the gitpod environment for the course** - The course is using gitpod in order to avoid the time expense for downloading and installing tools and data. +The course is using gitpod in order to avoid the time expense for downloading and installing tools and data. 2. **Exploring the nf-core tools command** - A very basic walk-through of what can be done with nf-core tools +A very basic walk-through of what can be done with nf-core tools 3. **Creating a new nf-core pipeline from the nf-core template** 4. **Exploring the nf-core template** - a) The git repository + a) The git repository - b) running the pipeline + b) running the pipeline - c) linting the pipeline + c) linting the pipeline - d) walk-through of the template files + d) walk-through of the template files 5. **Building a nf-core pipeline using the template** - a) Adding a nf-core module to your pipeline + a) Adding a nf-core module to your pipeline - b) Adding a local custom module to your pipeline + b) Adding a local custom module to your pipeline - c) Working with Nextflow schema + c) Working with Nextflow schema - d) Linting your modules + d) Linting your modules ## Preparation @@ -829,9 +831,9 @@ versions (file) │File containing software versions │versions.yml
solution 2 - ``` - nf-core modules list local - ``` + ``` + nf-core modules list local + ```
::: @@ -1075,6 +1077,7 @@ Here in the schema editor you can edit: ::: :::note + ### Key points - `nf-core create ` creates a pipeline from the nf-core template. @@ -1085,4 +1088,5 @@ Here in the schema editor you can edit: - `nf-core modules create` creates a module locally to add custom code into your pipeline. - `nf-core modules lint --all` lints your module code for things that must be completed. - `nf-core schema build` opens an interface to allow you to describe your pipeline parameters and set default values, and which values are valid. + ::: From 28eb330ac19f2cd9da619216273dc9668d73e9f8 Mon Sep 17 00:00:00 2001 From: Raquel Manzano Date: Tue, 16 Jan 2024 10:29:05 +0000 Subject: [PATCH 8/9] Added more detailed info and solved Fran's requests --- .../contributing/nf_core_basic_training.md | 207 ++++++++++++++++-- 1 file changed, 193 insertions(+), 14 deletions(-) diff --git a/src/content/docs/contributing/nf_core_basic_training.md b/src/content/docs/contributing/nf_core_basic_training.md index a89f27dedd..8ac751e858 100644 --- a/src/content/docs/contributing/nf_core_basic_training.md +++ b/src/content/docs/contributing/nf_core_basic_training.md @@ -158,7 +158,7 @@ echo "Edited" >> CODE_OF_CONDUCT.md nf-core lint ``` -# Adding Modules to a pipeline +## Adding Modules to a pipeline A module is a single `process` built to be reusable and self-contained so it can be used within different Nextflow pipelines. They encapsulate a specific function or task, for example running a single tool such as [`FastQC`](https://github.com/nf-core/modules/blob/master/modules/nf-core/fastqc/main.nf). You can import and use modules like functions in a Nextflow subworkflow, this makes your workflow more readable and maintainable. @@ -220,15 +220,47 @@ Open ./modules/local/demo/module.nf and start customising this to your needs whi ### Making a remote module for a custom script -To generate a module for a custom script you need to follow the same steps when adding a remote module. Then, you can supply the command for your script in the `script` block but your script needs to be present and *executable* in the `bin` folder of the pipeline. In the nf-core pipelines, this folder is in the main directory and you can see in [`rnaseq`](https://github.com/nf-core/rnaseq). Let's look at a example in this pipeline, for instance [`tximport.r`](https://github.com/nf-core/rnaseq/blob/master/bin/tximport.r). This is an Rscript present in the [`bin`](https://github.com/nf-core/rnaseq/tree/master/bin) of the pipeline. We can find the module that runs this script in [`modules/local/tximport`](https://github.com/nf-core/rnaseq/blob/master/modules/local/tximport/main.nf). As we can see the script is being called in the `script` block, note that `tximport.r` is being executed as if it was called from the command line and therefore needs to be *executable*. - -Let's create a simple custom script that converts a MAF file to a BED file called `maf2bed.py`: +To generate a module for a custom script you need to follow the same steps when adding a remote module. +Then, you can supply the command for your script in the `script` block but your script needs to be present +and *executable* in the `bin` +folder of the pipeline. +In the nf-core pipelines, +this folder is in the main directory and you can see in [`rnaseq`](https://github.com/nf-core/rnaseq). +Let's look at an publicly available example in this pipeline, +for instance [`tximport.r`](https://github.com/nf-core/rnaseq/blob/master/bin/tximport.r). +This is an Rscript present in the [`bin`](https://github.com/nf-core/rnaseq/tree/master/bin) of the pipeline. +We can find the module that runs this script in +[`modules/local/tximport`](https://github.com/nf-core/rnaseq/blob/master/modules/local/tximport/main.nf). +As we can see the script is being called in the `script` block, note that `tximport.r` is +being executed as if it was called from the command line and therefore needs to be *executable*. + +
+ +

TL;TR

+ +1. Write your script on any language (python, bash, R, + ruby). E.g. `maf2bed.py` +2. If not there yet, move your script to `bin` folder of + the pipeline and make it + executable (`chmod +x `) +3. Create a module with a single process to call your script from within the workflow. E.g. `./modules/local/convert_maf2bed/main.nf` +4. Include your new module in your workflow with the command `include {CONVERT_MAF2BED} from './modules/local/convert_maf2bed/main'` that is written before the workflow call. +
+ +_Tip: Try to follow best practices when writing a script for + reproducibility and maintenance purposes: add the + shebang (e.g. `#!/usr/bin/env python`), and a header + with description and type of license._ + +### 1. Write your script +Let's create a simple custom script that converts a MAF file to a BED file called `maf2bed.py` and place it in the bin directory of our nf-core-testpipeline:: ``` #!/usr/bin/env python -""" +"""bash title="maf2bed.py" Author: Raquel Manzano - @RaqManzano Script: Convert MAF to BED format keeping ref and alt info +License: MIT """ import argparse import pandas as pd @@ -243,7 +275,6 @@ def argparser(): ) return parser.parse_args() - def maf2bed(maf_file, bed_file, extra): maf = pd.read_csv(maf_file, sep="\t", comment="#") bed = maf[["Chromosome", "Start_Position", "End_Position"] + extra] @@ -260,30 +291,135 @@ if __name__ == "__main__": ``` -Now, let's go back to our module: +### 2. Make sure your script is in the right folder +Now, let's move it to the correct directory: + +``` +mv maf2bed.py /path/where/pipeline/is/bin/. +chmod +x /path/where/pipeline/is/bin/maf2bed.py +``` + +### 3. Create your custom module +Then, let's write our module. We will call the process +"CONVERT_MAF2BED" and add any tags or/and labels that +are appropriate (this is optional) and directives (via +conda and/or container) for +the definition of dependencies. + + +
+More info on labels +A `label` will +annotate the processes with a reusable identifier of your +choice that can be used for configuring. E.g. we use the +`label` 'process_single', this looks as follows: + +``` +withLabel:process_single { + cpus = { check_max( 1 * task.attempt, 'cpus' ) } + memory = { check_max( 1.GB * task.attempt, 'memory') } + time = { check_max( 1.h * task.attempt, 'time' ) } + } +``` +
+ +
+More info on tags + +A `tag` is simple a user provided identifier associated to +the task. In our process example, the input is a tuple +comprising a hash of metadata for the maf file called +`meta` and the path to the `maf` file. It may look +similar to: `[[id:'123', data_type:'maf'], +/path/to/file/example.maf]`. Hence, when nextflow makes +the call and `$meta.id` is `123` name of the job +will be "CONVERT_MAF2BED(123)". If `meta` does not have +`id` in its hash, then this will be literally `null`. + +
+ +
+More info on conda/container directives + +The `conda` directive allows for the definition of the +process dependencies using the [Conda package manager](https://docs.conda.io/en/latest/). Nextflow automatically sets up an environment for the given package names listed by in the conda directive. For example: + +``` +process foo { + conda 'bwa=0.7.15' + + ''' + your_command --here + ''' +} +``` +Multiple packages can be specified separating them with a blank space e.g. `bwa=0.7.15 samtools=1.15.1`. The name of the channel from where a specific package needs to be downloaded can be specified using the usual Conda notation i.e. prefixing the package with the channel name as shown here `bioconda::bwa=0.7.15` + +``` +process foo { + conda 'bioconda::bwa=0.7.15 bioconda::samtools=1.15.1' + + ''' + your_bwa_cmd --here + your_samtools_cmd --here + ''' +} +``` +Similarly, we can apply the `container` directive to execute the process script in a [Docker](http://docker.io/) or [Singularity](https://docs.sylabs.io/guides/3.5/user-guide/introduction.html) container. When running Docker, it requires the Docker daemon to be running in machine where the pipeline is executed, i.e. the local machine when using the local executor or the cluster nodes when the pipeline is deployed through a grid executor. + +``` +process foo { + conda 'bioconda::bwa=0.7.15 bioconda::samtools=1.15.1' + container 'dockerbox:tag' + + + ''' + your_bwa_cmd --here + your_samtools_cmd --here + ''' +} +``` + +Additionally, the `container` directive allows for a more sophisticated choice of container and if it Docker or Singularity depending on the users choice of container engine. This practice is quite common on official nf-core modules. +``` +process foo { + conda "bioconda::fastqc=0.11.9" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/fastqc:0.11.9--0' : + 'biocontainers/fastqc:0.11.9--0' }" + + ''' + your_fastqc_command --here + ''' +} +``` +
+Since `maf2bed.py` is in the `bin` directory we can directory call it in the script block of our new module `CONVERT_MAF2BED`. You only have to be careful with how you call variables (some explanations on when to use `${variable}` vs. `$variable`): +A process may contain any of the following definition blocks: directives, inputs, outputs, when clause, and the process script. Here is how we write it: ``` -process CUSTOM_SCRIPT { +process CONVERT_MAF2BED { + // HEADER tag "$meta.id" label 'process_single' - + // DEPENDENCIES DIRECTIVES conda "anaconda::pandas=1.4.3" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://depot.galaxyproject.org/singularity/pandas:1.4.3' : 'quay.io/biocontainers/pandas:1.4.3' }" - + // INPUT BLOCK input: tuple val(meta), path(maf) - + // OUTPUT BLOCK output: tuple val(meta), path('*.bed') , emit: bed path "versions.yml" , emit: versions - + // WHEN CLAUSE when: task.ext.when == null || task.ext.when - + // SCRIPT BLOCK script: // This script is bundled with the pipeline in bin def args = task.ext.args ?: '' def prefix = task.ext.prefix ?: "${meta.id}" @@ -293,8 +429,51 @@ maf2bed.py --mafin $maf --bedout ${prefix}.bed """ ``` -We are now able to use our custom script through the module +More on nextflow's process components in the [docs](https://www.nextflow.io/docs/latest/process.html). + + + +### Include your module in the workflow +In general, we will call out nextflow module `main.nf` and save it in the `modules` folder under another folder called `conver_maf2bed`. If you believe your custom script could be useful for others and it is potentially reusable or calling a tool that is not yet present in nf-core modules you can start the process of making it official adding a `meta.yml` [explained above](#adding-modules-to-a-pipeline). In the `meta.yml` The overall tree for the pipeline skeleton will look as follows: + +``` +pipeline/ +├── bin/ +│ └── maf2bed.py +├── modules/ +│ ├── local/ +│ │ └── convert_maf2bed/ +│ │ ├── main.nf +│ │ └── meta.yml +│ └── nf-core/ +├── config/ +│ ├── base.config +│ └── modules.config +... +``` + +To use our custom module located in `./modules/local/convert_maf2bed` within our workflow, we use a module inclusions command as follows (this has to be done before we invoke our workflow): + +``` +include { CONVERT_MAF2BED } from './modules/local/convert_maf2bed/main' +workflow { + input_data = [[id:123, data_type='maf'], /path/to/maf/example.maf] + CONVERT_MAF2BED(input_data) +} +``` + +### Other notes +#### What happens in I want to use containers but there is no image created with the packages I need? + +No worries, this can be done fairly easy thanks to [BioContainers](https://biocontainers-edu.readthedocs.io/en/latest/what_is_biocontainers.html), see instructions [here](https://github.com/BioContainers/multi-package-containers). If you see the combination that you need in the repo, you can also use [this website](https://midnighter.github.io/mulled) to find out the "mulled" name of this container. + +### I want to know more about software dependencies! + +You are in luck, we have more documentation [here](https://nf-co.re/docs/contributing/modules#software-requirements) + +#### I want to know more about modules! +See more info about modules in the nextflow docs [here](https://nf-co.re/docs/contributing/modules#software-requirements.) ## Lint all modules From 438f8b60adc6d3c1416d10ed0d271126a214d9d1 Mon Sep 17 00:00:00 2001 From: scheckley <2563006+scheckley@users.noreply.github.com> Date: Tue, 19 Mar 2024 11:23:05 +0000 Subject: [PATCH 9/9] Update subworkflows.md added link to meta maps in the "what is the meta map" section. --- src/content/docs/contributing/subworkflows.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/src/content/docs/contributing/subworkflows.md b/src/content/docs/contributing/subworkflows.md index 82028e57d6..d10f025287 100644 --- a/src/content/docs/contributing/subworkflows.md +++ b/src/content/docs/contributing/subworkflows.md @@ -388,9 +388,7 @@ nextflow run tests/subworkflows/nf-core/ -entry test_ +The meta variable can be passed down to processes as a tuple of the channel containing the actual samples, e.g. FastQ files, and the meta variable. The `meta map` is a [groovy map](https://www.tutorialspoint.com/groovy/groovy_maps.htm), which is like a python dictionary. Additional documentation on meta maps is available [here](https://nf-co.re/docs/contributing/modules#what-is-the-meta-map). ## Help