Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add script recommendations #2888

Merged
merged 15 commits into from
Dec 18, 2024
Merged
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 67 additions & 0 deletions sites/docs/src/content/docs/guidelines/components/modules.md
Original file line number Diff line number Diff line change
Expand Up @@ -324,6 +324,73 @@ See the [Bash manual on file operators](https://tldp.org/LDP/abs/html/fto.html)

Alternate suggestions include using `grep -c` to search for a valid string match, or other tool which will appropriately error when the expected output is not successfully created.

### Script inclusion

:::warning
You SHOULD NOT use [Nextflow module binaries](https://www.nextflow.io/docs/latest/module.html#module-binaries), as these are not fully portable across all execution contexts.
:::
jfy133 marked this conversation as resolved.
Show resolved Hide resolved

:::note
Where script content in a module becomes particularly extensive, we strongly encourage packaging and hosting the code externally and provisioning via Conda/Docker as a standalone tool(kit).
:::

If a module's `script:` block contains a script rather than command invocations, regardless of the language (e.g., Bash, R, Python), and the content is more than 20 lines, it MUST be provided through a [Nextflow module template](https://www.nextflow.io/docs/latest/module.html#module-templates).
Using module templates helps distinguish between changes made to the scientific logic within the script and those affecting the workflow-specific logic in the module. This separation improves the code's clarity and maintainability.

christopher-hakkaart marked this conversation as resolved.
Show resolved Hide resolved
#### Inline script code

If the script content is less than 20 lines, the code MAY be embedded directly in the module without a dedicated template file. However, they should still follow the guidance content as with a template.
jfy133 marked this conversation as resolved.
Show resolved Hide resolved

#### Module template location

The template MUST go into a directory called `templates/` in the same directory as the module `main.nf`.

The template file MUST be named after the module itself with a language-appropriate file suffix. For example, the `deseq2/differential` nf-core module will use the `deseq2_differential.R`.

The template file can then be referred to within the module using the template function:

```nextflow
script:
template 'deseq2_differential.R'
```

See [`deseq2/differential`](https://github.com/nf-core/modules/blob/master/modules/nf-core/deseq2/differential/main.nf#L47) for an example of a template in an nf-core pipeline.

The resulting structure would look like this.

```tree
deseq2
└── differential
├── environment.yml
├── main.nf
├── meta.yml
├── templates
│   └── deqseq2_differential.R
└── tests
├── main.nf.test
├── main.nf.test.snap
└── tags.yml
```

#### Template or inline script-code contents

:::warning
Be aware that in any script template that Nextflow needs to be escaped in the same way you would in a standard bash `script:` block!
:::

The script template file or inline script code (<20 lines) MUST generate a `versions.yml` file in the language-appropriate way that contains versions of the base language and all relevant libraries and packages.
jfy133 marked this conversation as resolved.
Show resolved Hide resolved

The generated `versions.yml` MUST have the same structure as a standard nf-core module `versions.yml`.

See the [`deseq2/differential` module](https://github.com/nf-core/modules/blob/4c2d06a5e79abf08ba7f04c58e39c7dad75f094d/modules/nf-core/deseq2/differential/templates/deseq_de.R#L509-L534) for an example using R.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing is mentioned on the R_sessionInfo.log that is created in the template there.

Also an alternative suggestion for writing the R version:

paste(version[['major']],version[['minor']], sep = ".")

Think it's more easier to read then:

strsplit(version[['version.string']], ' ')[[1]][3]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing is mentioned on the R_sessionInfo.log that is created in the template there.

What do you mean? I don't follow, sorry.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the second comment @Joon-Klaps please feel free to update the moduel/example :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the mentioned example lines of code there is a small section dedicated into exporting the R.sessionInfo().

sink(paste(opt\$output_prefix, "R_sessionInfo.log", sep = '.'))
print(sessionInfo())
sink()

However, nothing is mentioned of this in these guidelines. So is it necessary or not? I'm guessing so given it was included in the referenced lines but would also explicitly mention it then in the guidelines. Although not certain about the added value of the R.session() info.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got you. I don't think that makes sense putting 'how' to generate the yaml file in the guideline per se - we only care that it exists.

We can point to other examples though if it helps (feel free to make an additional PR). Maybe a dedicated tutorial page how to write such a template module would make sense due to the intricacies of the variable escaping? What do you think @pinin4fjords ?


#### Stubs in templated modules

A templated module MUST have a stub block in the same way as any other module. For example, using `touch` to generate empty files and versions. See [`deseq2/differential` module](https://github.com/nf-core/modules/blob/4c2d06a5e79abf08ba7f04c58e39c7dad75f094d/modules/nf-core/deseq2/differential/main.nf#L34-L49) for an example in an nf-core module.

An inline command to call the version for libraries for the `versions.yml` MAY be used in this case.
For an R example see [deseq2/differential](https://github.com/nf-core/modules/blob/4c2d06a5e79abf08ba7f04c58e39c7dad75f094d/modules/nf-core/deseq2/differential/main.nf#L47).

### Stubs

#### Stub block must exist
Expand Down
Loading