-
Notifications
You must be signed in to change notification settings - Fork 74
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
21bc71d
commit ee8b209
Showing
1 changed file
with
92 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,92 @@ | ||
--- | ||
title: Leveraging nf-test for enhanced quality control in nf-core | ||
date: 2024-03-28 | ||
type: post | ||
description: This is the first time we have a Nextflow workshop in Korea, and the feedback was amazing! | ||
image: img/blog-2024-03-14--share.jpg | ||
tags: nextflow,nf-core,nf-test | ||
status: published | ||
author: Carson Miller & Sateesh Peri | ||
icon: carson_sateesh.jpg | ||
--- | ||
|
||
# The ever-changing landscape of bioinformatics | ||
|
||
Reproducibility is an important attribute of all good science. This is especially true in the realm of bioinformatics, where software is hopefully being constantly updated, and pipelines are ideally being maintained. Improvements and maintenance are great, but they also bring about an important question: Do bioinformatics tools and pipelines continue to run successfully and produce consistent results despite these changes? Fortunately for us, there is an existing approach to ensure software reproducibility: testing. | ||
|
||
<!-- end-archive-description --> | ||
|
||
# The Wonderful World of Testing | ||
|
||
> "Software testing is the process of evaluating and verifying that a software product does what it is supposed to do," | ||
> Lukas Forer, co-creator of nf-test. | ||
Software testing has two primary purposes: determining whether an operation continues to run successfully after changes are made, and comparing outputs to see if they are consistent. Testing can alert the developer that an output has changed so that an appropriate fix can be made. Admittedly, there are some instances when altered outputs are intentional (i.e., improving a tool might lead to better, and therefore different, results). However, even in these scenarios, it is important to know what has changed, so that no unintentional changes are being made as a result of the update. | ||
|
||
# Writing effective tests | ||
|
||
Although having any test is certainly better than having no tests at all, there are several considerations to keep in mind when adding tests to pipelines and/or tools to maximize their effectiveness. These considerations can be broadly categorized into two groups: | ||
1. Which inputs/functionalities should be tested? | ||
2. What contents should be tested? | ||
|
||
## Consideration 1: Testing inputs/functionality | ||
|
||
Generally, the software will have a default or most common use case. For instance, the nf-core [FastQC](https://nf-co.re/modules/fastqc) module is commonly used to assess the quality of paired-end reads in FastQ format. However, this is not the only way to use the FastQC module. Inputs can also be single-end/interleaved FastQ files, BAM files, or can contain reads from multiple samples. Each input type is analyzed differently by FastQC, and therefore, to increase your test coverage (["the degree to which a test or set of tests exercises a particular program or system"](https://www.geeksforgeeks.org/test-design-coverage-in-software-testing/)), a test should be written for each possible input. Additionally, different settings can change how a process is executed. For example, in the [bowtie2/align](https://nf-co.re/modules/bowtie2_align) module, aside from input files, the `save_unaligned` and `sort_bam` parameters can alter how this module functions and the outputs it generates. Thus, tests should be written for each possible scenario. When writing tests, aim to consider as many variations as possible. If some are missed, don't worry! Additional tests can be added later. Discovering these different use cases and how to address/test them is part of the development process. | ||
|
||
## Consideration 2: Testing outputs | ||
|
||
Once test cases are established, the next step is determining what specifically should be evaluated in each test. Generally, these evaluations are referred to as assertions. Assertions can range from verifying whether a job has been completed successfully to comparing the output channel/file contents between runs. Ideally, tests should incorporate all outputs, although there are scenarios where this is not feasible. In such cases, it's often best to include at least a portion of the contents from the problematic file or, at the minimum, the name of the file to ensure that it is consistently produced. | ||
|
||
# Testing in nf-core | ||
|
||
nf-core is a community-driven initiative that aims to provide high-quality, Nextflow-based bioinformatics pipelines. The community's emphasis on reproducibility makes testing an essential aspect of the nf-core ecosystem. Until recently, tests were implemented using pytest for modules/subworkflows and test profiles for pipelines. These tests ensured that nf-core components could run successfully following updates. However, at the pipeline level, they did not check file contents to evaluate output consistency. Additionally, using two different testing approaches lacked the standardization nf-core strives for. nf-test addresses past concerns about reproducibility and standardization by checking the contents of output files for each run and utilizing a consistent framework for testing all nf-core components, from functions to pipelines. An ideal test framework would integrate tests at all Nextflow development levels (functions, modules, subworkflows, and pipelines) and comprehensively test outputs. | ||
|
||
# New and Improved Nextflow Testing with nf-test | ||
|
||
Created by [Lukas Forer](https://github.com/lukfor) and [Sebastian Schönherr](https://github.com/seppinho), nf-test has emerged as the leading solution for testing Nextflow pipelines. Their goal was to enhance the evaluation of reproducibility in complex Nextflow pipelines. To this end, they have implemented several notable features, creating a robust testing platform: | ||
|
||
**Comprehensive Output Testing**: nf-test employs [snapshots](https://www.nf-test.com/docs/assertions/snapshots/) for handling complex data structures. This feature evaluates the contents of any specified channel/file output, enabling comprehensive and reliable tests that ensure data integrity through changes. | ||
**A Consistent Testing Framework for All Nextflow Components**: nf-test provides a unified framework for testing everything from individual functions to entire pipelines, ensuring consistency across all components. | ||
**A DSL for Tests**: Designed in the likeness of Nextflow, nf-test's intuitive domain-specific language (DSL) uses 'when' and 'then' blocks to describe expected behaviors in pipelines, facilitating easier test script writing. | ||
**Readable Assertions**: nf-test offers a wide range of functions for writing clear and understandable [assertions](https://www.nf-test.com/docs/assertions/assertions/), improving the clarity and maintainability of tests. | ||
**Boilerplate Code Generation**: To accelerate the testing process, nf-test and nf-core tools feature commands that generate boilerplate code, streamlining the development of new tests. | ||
|
||
# But wait… there's more! | ||
|
||
The merits of having a consistent and comprehensive testing platform are significantly amplified with nf-test's integration into nf-core. This integration provides an abundance of resources for incorporating nf-test into your Nextflow development. Thanks to this collaboration, you can utilize common nf-test commands via nf-core tools and easily install nf-core modules/subworkflows that already have testing implemented. Moreover, an expanding collection of examples is available to guide you in adopting nf-test for your projects. | ||
|
||
# Adding nf-test to pipelines | ||
|
||
Several nf-core pipelines have begun to adopt nf-test as their testing framework. Among these, methylseq was the first to implement pipeline-level nf-tests as a proof-of-concept. However, since this initial implementation, nf-core maintainers have identified that the existing nf-core pipeline template needs modifications to better support nf-test. These adjustments aim to enhance compatibility with nf-test across components (modules, subworkflows, workflows) and ensure that tests are included and shipped with each component. A more detailed blog post about these changes will be published in the future. | ||
Following these insights, fetchngs has been at the forefront of incorporating nf-test for testing modules, subworkflows, workflows, and at the pipeline level. Currently, fetchngs serves as the best-practice example for nf-test implementation within the nf-core community. Other pipelines actively integrating nf-test include mag, sarek, readsimulator, and rnaseq. | ||
|
||
# Pipeline development with nf-test | ||
|
||
For newer nf-core pipelines, integrating nf-test as early as possible in the development process is highly recommended. An example of a pipeline that has benefitted from the incorporation of nf-tests throughout its development is phageannotator. Although integrating nf-test during pipeline development has presented challenges, it has offered a unique opportunity to evaluate different testing methodologies and has been instrumental in identifying numerous development errors that might have been overlooked using the previous test profiles approach. Additionally, investing time early on has significantly simplified modifying different aspects of the pipeline, ensuring that functionality and output remain unaffected. | ||
For those embarking on creating new Nextflow pipelines, here are a few key takeaways from our experience: | ||
|
||
1. Leverage nf-core modules/subworkflows extensively. Devoting time early to contributing modules/subworkflows to nf-core not only streamlines future development for you and your PR reviewers but also simplifies maintaining, linting, and updating pipeline components through nf-core tools. Furthermore, these modules will likely benefit others in the community with similar research interests. | ||
2. Prioritize incremental changes over large overhauls. Incremental changes are almost always preferable to large, unwieldy modifications. This approach is particularly beneficial when monitoring and updating nf-tests at the module, subworkflow, workflow, and pipeline levels. Introducing too many changes simultaneously can overwhelm both developers and reviewers, making it challenging to track what has been modified and what requires testing. Aim to keep changes straightforward and manageable. | ||
3. Facilitate parallel execution of nf-test to generate and test snapshots. By default, nf-test runs each test sequentially, which can make the process of running multiple tests to generate or update snapshots time-consuming. Implementing scripts that enable tests to run in parallel—whether through a workload manager or in the cloud—can save a considerable amount of time and simplify the monitoring of passing and failing tests. | ||
|
||
# Community and contribution | ||
|
||
nf-core is a community that relies on consistent contributions, evaluation, and feedback from its members to improve and stay up-to-date. This holds as we transition to a new testing framework as well. Currently, there are two primary ways that people have been contributing in this transition: | ||
|
||
1. **Adding nf-tests to new and existing nf-core module/subworkflows**. There has been a recent emphasis on migrating modules/subworkflows from pytest to nf-test because of the advantages mentioned previously. Fortunately, the nf-core team has added very helpful [instructions](https://nf-co.re/docs/contributing/modules#migrating-from-pytest-to-nf-test) to the website, which has made this process much more streamlined. | ||
2. **Adding nf-tests to nf-core pipelines**. Another area of focus is the addition of nf-tests to nf-core pipelines. This process can be quite difficult for large, complex pipelines, but there are now several examples of pipelines with nf-tests that can be used as a blueprint for getting started ([fetchngs](https://github.com/nf-core/fetchngs/tree/master), [sarek](https://github.com/nf-core/sarek/tree/master), [readsimulator](https://github.com/nf-core/readsimulator/tree/master), [phageannotator](https://github.com/nf-core/phageannotator)). | ||
|
||
> These are great areas to work on & contribute in the upcoming nf-core hackathon in March 2024 | ||
Yet the role of the community is not limited to adding test code. A robust testing infrastructure requires nf-core users to identify testing errors, and additional test cases, and provide feedback so that the system can continually be improved. Each of us brings a different perspective, and the development-feedback loop that results from collaboration brings about much a more effective, transparent, and inclusive system than if we worked in isolation. | ||
|
||
# Future directions | ||
|
||
Looking ahead, nf-core and nf-test are poised for tighter integration and significant advancements. Anticipated developments include enhanced testing capabilities, more intuitive interfaces for writing and managing tests, and deeper integration with cloud-based resources. These improvements will further solidify the position of nf-core and nf-test at the forefront of bioinformatics workflow management. | ||
|
||
# Conclusion | ||
|
||
The integration of nf-test within the nf-core ecosystem marks a significant leap forward in ensuring the reproducibility and reliability of bioinformatics pipelines. By adopting nf-test, developers and researchers alike can contribute to a culture of excellence and collaboration, driving forward the quality and accuracy of bioinformatics research. | ||
|
||
> Special thanks to everyone in the #nf-test channel in the nf-core slack workspace for their invaluable contributions, feedback, and support throughout this adoption. We are immensely grateful for your commitment and look forward to continuing our productive collaboration. | ||